Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance
Haipeng Li 1, Rongxuan Peng 1,2, Anwei Luo 2, Shunquan Tan 3, Changsheng Chen 3, Anastasia Antsiferova 4
Published on arXiv
2602.06530
Input Manipulation Attack
OWASP ML Top 10 — ML01
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
ForgeryEraser causes substantial performance degradation on state-of-the-art AIGC detectors across both global synthesis and local editing benchmarks without any access to target detector parameters.
ForgeryEraser
Novel technique introduced
The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
Key Contributions
- Reveals a systemic adversarial vulnerability arising from shared VLM backbones (CLIP) across diverse downstream AIGC detectors, enabling transferable black-box attacks without access to target detector parameters.
- Proposes ForgeryEraser, a multi-modal guidance loss that drives forged image embeddings toward text-derived authentic anchors while repelling them from forgery anchors in the VLM feature space.
- Demonstrates that ForgeryEraser causes substantial detector performance degradation on global synthesis and local editing benchmarks, and forces explainable forensic models to produce authentic-sounding justifications for forged images.
🛡️ Threat Analysis
ForgeryEraser crafts gradient-based adversarial perturbations applied to forged images at inference time, exploiting the VLM feature space to cause misclassification — a canonical evasion/input manipulation attack against classifiers.
The attack specifically targets AI-generated content detection systems (AIGC detectors, deepfake/forgery detectors), defeating their ability to authenticate image content and even inducing explainable forensic models to generate false authentic justifications.