defense 2026

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Chao Shuai 1, Zhenguang Liu 1, Shaojing Fan 2, Bin Gong 1, Weichen Lian 1, Xiuli Bi 3, Zhongjie Ba 1, Kui Ren 1

0 citations

α

Published on arXiv

2603.09242

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

GSD achieves 94.4% video-level AUC in cross-dataset evaluation (+1.2% over SOTA) and improves robustness to unseen manipulations by +3.0% on DF40 by removing semantic shortcuts from VFM representations.

Geometric Semantic Decoupling (GSD)

Novel technique introduced


AI-generated image detection has become increasingly important with the rapid advancement of generative AI. However, detectors built on Vision Foundation Models (VFMs, \emph{e.g.}, CLIP) often struggle to generalize to images created using unseen generation pipelines. We identify, for the first time, a key failure mechanism, termed \emph{semantic fallback}, where VFM-based detectors rely on dominant pre-trained semantic priors (such as identity) rather than forgery-specific traces under distribution shifts. To address this issue, we propose \textbf{Geometric Semantic Decoupling (GSD)}, a parameter-free module that explicitly removes semantic components from learned representations by leveraging a frozen VFM as a semantic guide with a trainable VFM as an artifact detector. GSD estimates semantic directions from batch-wise statistics and projects them out via a geometric constraint, forcing the artifact detector to rely on semantic-invariant forensic evidence. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving 94.4\% video-level AUC (+\textbf{1.2\%}) in cross-dataset evaluation, improving robustness to unseen manipulations (+\textbf{3.0\%} on DF40), and generalizing beyond faces to the detection of synthetic images of general scenes, including UniversalFakeDetect (+\textbf{0.9\%}) and GenImage (+\textbf{1.7\%}).


Key Contributions

  • Identifies 'semantic fallback' — a novel failure mechanism where VFM-based detectors collapse onto semantic priors (e.g., identity) rather than forensic traces under distribution shift
  • Proposes Geometric Semantic Decoupling (GSD), a parameter-free module that uses QR decomposition to project out semantic directions from learned representations, forcing detectors to rely on semantic-invariant forgery evidence
  • Demonstrates consistent SOTA improvements across cross-dataset deepfake detection (+1.2% AUC), unseen manipulation robustness (+3.0% on DF40), and general synthetic image detection (GenImage +1.7%, UniversalFakeDetect +0.9%)

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel forensic detection architecture for AI-generated/synthetic images and deepfakes — directly addresses output integrity and content provenance by improving detectors' ability to distinguish real from AI-generated content across unseen generative pipelines.


Details

Domains
visiongenerative
Model Types
transformerdiffusiongan
Threat Tags
inference_time
Datasets
FaceForensics++Celeb-DFDF40UniversalFakeDetectGenImage
Applications
deepfake detectionai-generated image detectionsynthetic image forensics