Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.

Key Contributions

Reveals a systemic adversarial vulnerability arising from shared VLM backbones (CLIP) across diverse downstream AIGC detectors, enabling transferable black-box attacks without access to target detector parameters.
Proposes ForgeryEraser, a multi-modal guidance loss that drives forged image embeddings toward text-derived authentic anchors while repelling them from forgery anchors in the VLM feature space.
Demonstrates that ForgeryEraser causes substantial detector performance degradation on global synthesis and local editing benchmarks, and forces explainable forensic models to produce authentic-sounding justifications for forged images.

🛡️ Threat Analysis

Input Manipulation Attack

ForgeryEraser crafts gradient-based adversarial perturbations applied to forged images at inference time, exploiting the VLM feature space to cause misclassification — a canonical evasion/input manipulation attack against classifiers.

Output Integrity Attack

The attack specifically targets AI-generated content detection systems (AIGC detectors, deepfake/forgery detectors), defeating their ability to authenticate image content and even inducing explainable forensic models to generate false authentic justifications.