Latest papers

20 papers
attack arXiv Mar 31, 2026 · 8d ago

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

Yunrui Yu, Xuxiang Feng, Pengda Qin et al. · Tsinghua University · University of Macau +1 more

Novel adversarial attack targeting dummy-class defenses by simultaneously attacking true and dummy labels with adaptive weighting

Input Manipulation Attack vision
PDF
attack arXiv Mar 26, 2026 · 13d ago

A Unified Spatial Alignment Framework for Highly Transferable Transformation-Based Attacks on Spatially Structured Tasks

Jiaming Liang, Chi-Man Pun · University of Macau

Spatial transformation-based adversarial attacks on segmentation and detection models via synchronized label-input alignment

Input Manipulation Attack vision
PDF
defense arXiv Mar 26, 2026 · 13d ago

Efficient Preemptive Robustification with Image Sharpening

Jiaming Liang, Chi-Man Pun · University of Macau

Image sharpening as a simple, efficient pre-attack defense that robustifies benign images against adversarial perturbations before attacks occur

Input Manipulation Attack vision
PDF
defense arXiv Mar 25, 2026 · 14d ago

High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Peipeng Yu, Jinfeng Xie, Chengfu Ou et al. · Jinan University · University of Macau +2 more

Embeds semantic watermarks in face images for copyright protection, pixel-level deepfake localization, and content recovery after manipulation

Output Integrity Attack visiongenerative
PDF
tool arXiv Mar 23, 2026 · 16d ago

FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection

Zhilin Tu, Kemou Li, Fengpeng Li et al. · University of Electronic Science and Technology of China · University of Macau +2 more

Multi-expert ensemble detector for AI-generated images robust to degradations, using CLIP/SigLIP transformers with feature distillation

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative
PDF
defense arXiv Feb 6, 2026 · 8w ago

AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models

Fengpeng Li, Kemou Li, Qizhou Wang et al. · University of Macau · King Abdullah University of Science and Technology +2 more

Defends diffusion model concept erasure against adversarial prompt reactivation attacks via semantic-center-targeting adversarial erasure targets and gradient projection

Input Manipulation Attack visiongenerative
PDF Code
defense arXiv Feb 4, 2026 · 9w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative
PDF
defense arXiv Jan 28, 2026 · 10w ago

MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models

Wenbo Xu, Wei Lu, Xiangyang Luo et al. · Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing +1 more

Proposes VLM-based deepfake detector using RLHF and multimodal alignment rewards for explainable forgery reasoning and spatial localization

Output Integrity Attack visionmultimodal
PDF
benchmark arXiv Jan 24, 2026 · 10w ago

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Jiaming Liang, Haowei Liu, Chi-Man Pun · University of Macau · Chongqing University of Posts and Telecommunications

Proposes OTI, a model-free texture-based metric for quantifying per-image adversarial vulnerability without model access

Input Manipulation Attack vision
PDF Code
attack arXiv Jan 17, 2026 · 11w ago

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li et al. · University of Macau · National Institute of Informatics +2 more

Hard-label black-box adversarial attack using frequency-domain initialization and pattern-driven optimization to recover gradient sign information

Input Manipulation Attack vision
PDF Code
defense arXiv Jan 12, 2026 · 12w ago

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Li Zheng, Liangbin Xie, Jiantao Zhou et al. · University of Macau · Shenzhen Institute of Advanced Technology

Defeats anti-fine-tuning image protections on Stable Diffusion by minimizing DDIM inversion reconstruction error to purify adversarial noise

Output Integrity Attack visiongenerative
PDF Code
defense arXiv Jan 3, 2026 · Jan 2026

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

Jiajie Zhu, Xia Du, Xiaoyuan Liu et al. · Xiamen University of Technology · Sichuan University +2 more

Reversible adversarial audio perturbations fool ASR systems into wrong transcriptions while authorized parties recover the original audio losslessly

Input Manipulation Attack audio
PDF
defense arXiv Nov 20, 2025 · Nov 2025

How Noise Benefits AI-generated Image Detection

Jiazhen Yan, Ziqiang Li, Fan Wang et al. · Nanjing University of Information Science and Technology · University of Macau +1 more

Proposes PiN-CLIP, a noise-guided CLIP fine-tuning method that suppresses spurious shortcuts for generalizable AI-generated image detection

Output Integrity Attack visiongenerative
PDF
defense arXiv Nov 17, 2025 · Nov 2025

DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection

Jiazhen Yan, Ziqiang Li, Fan Wang et al. · Nanjing University of Information Science and Technology · University of Macau

Novel gradient surgery framework fine-tunes CLIP for AI-generated image detection while preventing catastrophic forgetting

Output Integrity Attack visionmultimodal
PDF
attack arXiv Nov 15, 2025 · Nov 2025

Dynamic Parameter Optimization for Highly Transferable Transformation-Based Attacks

Jiaming Liang, Chi-Man Pun · University of Macau

Improves black-box adversarial transferability via dynamic parameter optimization, cutting grid-search complexity from O(mn) to O(n log m)

Input Manipulation Attack vision
PDF
attack EMNLP Oct 11, 2025 · Oct 2025

Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety

Yuyi Huang, Runzhe Zhan, Lidia S.Chao et al. · Guangzhou Medical University · University of Macau

Identifies 'Path Drift' jailbreak in chain-of-thought LLMs via first-person priming, ethical evaporation, and condition chaining to bypass RLHF safety

Prompt Injection nlp
2 citations PDF
attack EMNLP Sep 23, 2025 · Sep 2025

The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking

Yaoyao Qian, Yifan Zeng, Yuchao Jiang et al. · Northeastern University · Oregon State University +1 more

Attacks LLM-based document rankers via content injection that hijacks evaluation objectives or relevance criteria, boosting attacker documents to top positions

Prompt Injection nlp
1 citations 1 influentialPDF Code
defense arXiv Aug 18, 2025 · Aug 2025

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Xin Chen, Junchao Wu, Shu Yang et al. · University of Macau · Chinese Academy of Sciences +2 more

Proposes RepreGuard, detecting LLM-generated text via hidden activation patterns for robust OOD detection at 94.92% AUROC

Output Integrity Attack nlp
PDF Code
defense arXiv Aug 2, 2025 · Aug 2025

NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection

Jiazhen Yan, Fan Wang, Weiwei Jiang et al. · Nanjing University of Information Science and Technology · University of Macau

Proposes NULL-Space projection on CLIP features to remove semantic bias, improving generalized AI-generated image detection by 7.4%

Output Integrity Attack visiongenerative
PDF