Rethinking Cross-Generator Image Forgery Detection through DINOv3
Zhenglin Huang 1, Jason Li 2, Haiquan Wen 1, Tianxiao Li 1, Xi Yang 3, Lu Qi 4, Bei Peng 5, Xiaowei Huang 1, Ming-Hsuan Yang 4, Guangliang Cheng 1
Published on arXiv
2511.22471
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Frozen DINOv3 features with a lightweight linear probe and training-free token selection achieve strong cross-generator detection performance without any fine-tuning on target generators.
DINOv3 Token Ranking
Novel technique introduced
As generative models become increasingly diverse and powerful, cross-generator detection has emerged as a new challenge. Existing detection methods often memorize artifacts of specific generative models rather than learning transferable cues, leading to substantial failures on unseen generators. Surprisingly, this work finds that frozen visual foundation models, especially DINOv3, already exhibit strong cross-generator detection capability without any fine-tuning. Through systematic studies on frequency, spatial, and token perspectives, we observe that DINOv3 tends to rely on global, low-frequency structures as weak but transferable authenticity cues instead of high-frequency, generator-specific artifacts. Motivated by this insight, we introduce a simple, training-free token-ranking strategy followed by a lightweight linear probe to select a small subset of authenticity-relevant tokens. This token subset consistently improves detection accuracy across all evaluated datasets. Our study provides empirical evidence and a feasible hypothesis for understanding why foundation models generalize across diverse generators, offering a universal, efficient, and interpretable baseline for image forgery detection.
Key Contributions
- Empirical finding that frozen DINOv3 features generalize across unseen generators by relying on global, low-frequency structural cues rather than generator-specific high-frequency artifacts
- Training-free token-ranking strategy that selects a small subset of authenticity-relevant ViT tokens, consistently improving cross-generator detection accuracy
- Systematic frequency, spatial, and token-perspective analysis providing an interpretable hypothesis for why vision foundation models generalize for image forgery detection
🛡️ Threat Analysis
Directly addresses AI-generated image detection (image forgery detection) — the core task is distinguishing real images from synthetic ones produced by diverse generative models, which is output integrity / content authenticity. The paper contributes novel forensic analysis and a detection methodology (token-ranking + linear probe) rather than merely applying existing detectors to a domain.