defense 2026

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Chao Shuai ¹, Zhenguang Liu ¹, Shaojing Fan ², Bin Gong ¹, Weichen Lian ¹, Xiuli Bi ³, Zhongjie Ba ¹, Kui Ren ¹

¹ Zhejiang University

² National University of Singapore

³ Chongqing University of Posts and Telecommunications

0 citations

Published on arXiv

2603.09242

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

GSD achieves 94.4% video-level AUC in cross-dataset evaluation (+1.2% over SOTA) and improves robustness to unseen manipulations by +3.0% on DF40 by removing semantic shortcuts from VFM representations.

Geometric Semantic Decoupling (GSD)

Novel technique introduced

AI-generated image detection has become increasingly important with the rapid advancement of generative AI. However, detectors built on Vision Foundation Models (VFMs, \emph{e.g.}, CLIP) often struggle to generalize to images created using unseen generation pipelines. We identify, for the first time, a key failure mechanism, termed \emph{semantic fallback}, where VFM-based detectors rely on dominant pre-trained semantic priors (such as identity) rather than forgery-specific traces under distribution shifts. To address this issue, we propose \textbf{Geometric Semantic Decoupling (GSD)}, a parameter-free module that explicitly removes semantic components from learned representations by leveraging a frozen VFM as a semantic guide with a trainable VFM as an artifact detector. GSD estimates semantic directions from batch-wise statistics and projects them out via a geometric constraint, forcing the artifact detector to rely on semantic-invariant forensic evidence. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving 94.4\% video-level AUC (+\textbf{1.2\%}) in cross-dataset evaluation, improving robustness to unseen manipulations (+\textbf{3.0\%} on DF40), and generalizing beyond faces to the detection of synthetic images of general scenes, including UniversalFakeDetect (+\textbf{0.9\%}) and GenImage (+\textbf{1.7\%}).

Key Contributions

Identifies 'semantic fallback' — a novel failure mechanism where VFM-based detectors collapse onto semantic priors (e.g., identity) rather than forensic traces under distribution shift
Proposes Geometric Semantic Decoupling (GSD), a parameter-free module that uses QR decomposition to project out semantic directions from learned representations, forcing detectors to rely on semantic-invariant forgery evidence
Demonstrates consistent SOTA improvements across cross-dataset deepfake detection (+1.2% AUC), unseen manipulation robustness (+3.0% on DF40), and general synthetic image detection (GenImage +1.7%, UniversalFakeDetect +0.9%)

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel forensic detection architecture for AI-generated/synthetic images and deepfakes — directly addresses output integrity and content provenance by improving detectors' ability to distinguish real from AI-generated content across unseen generative pipelines.

Details

Domains

visiongenerative

Model Types

transformerdiffusiongan

Threat Tags

inference_time

Datasets

FaceForensics++Celeb-DFDF40UniversalFakeDetectGenImage

Applications

deepfake detectionai-generated image detectionsynthetic image forensics

Read PDF arXiv Code

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Detecting Generated Images by Fitting Natural Image Distributions

NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection

CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection

FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection

Training-free Detection of AI-generated images via Cropping Robustness

Exposing DeepFakes via Hyperspectral Domain Mapping

Patch-Discontinuity Mining for Generalized Deepfake Detection

Semantic-Aware Reconstruction Error for Detecting AI-Generated Images