When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection
Chao Shuai 1, Zhenguang Liu 1, Shaojing Fan 2, Bin Gong 1, Weichen Lian 1, Xiuli Bi 3, Zhongjie Ba 1, Kui Ren 1
Published on arXiv
2603.09242
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
GSD achieves 94.4% video-level AUC in cross-dataset evaluation (+1.2% over SOTA) and improves robustness to unseen manipulations by +3.0% on DF40 by removing semantic shortcuts from VFM representations.
Geometric Semantic Decoupling (GSD)
Novel technique introduced
AI-generated image detection has become increasingly important with the rapid advancement of generative AI. However, detectors built on Vision Foundation Models (VFMs, \emph{e.g.}, CLIP) often struggle to generalize to images created using unseen generation pipelines. We identify, for the first time, a key failure mechanism, termed \emph{semantic fallback}, where VFM-based detectors rely on dominant pre-trained semantic priors (such as identity) rather than forgery-specific traces under distribution shifts. To address this issue, we propose \textbf{Geometric Semantic Decoupling (GSD)}, a parameter-free module that explicitly removes semantic components from learned representations by leveraging a frozen VFM as a semantic guide with a trainable VFM as an artifact detector. GSD estimates semantic directions from batch-wise statistics and projects them out via a geometric constraint, forcing the artifact detector to rely on semantic-invariant forensic evidence. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving 94.4\% video-level AUC (+\textbf{1.2\%}) in cross-dataset evaluation, improving robustness to unseen manipulations (+\textbf{3.0\%} on DF40), and generalizing beyond faces to the detection of synthetic images of general scenes, including UniversalFakeDetect (+\textbf{0.9\%}) and GenImage (+\textbf{1.7\%}).
Key Contributions
- Identifies 'semantic fallback' — a novel failure mechanism where VFM-based detectors collapse onto semantic priors (e.g., identity) rather than forensic traces under distribution shift
- Proposes Geometric Semantic Decoupling (GSD), a parameter-free module that uses QR decomposition to project out semantic directions from learned representations, forcing detectors to rely on semantic-invariant forgery evidence
- Demonstrates consistent SOTA improvements across cross-dataset deepfake detection (+1.2% AUC), unseen manipulation robustness (+3.0% on DF40), and general synthetic image detection (GenImage +1.7%, UniversalFakeDetect +0.9%)
🛡️ Threat Analysis
Primary contribution is a novel forensic detection architecture for AI-generated/synthetic images and deepfakes — directly addresses output integrity and content provenance by improving detectors' ability to distinguish real from AI-generated content across unseen generative pipelines.