Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect
Amirtaha Amanzadi , Zahra Dehghanian , Hamid Beigy , Hamid R. Rabiee
Published on arXiv
2510.05740
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
FusionDetect outperforms its closest competitor by +3.87% accuracy and +6.13% average precision on established benchmarks, and gains +4.48% accuracy on the new OmniGen cross-domain benchmark.
FusionDetect
Novel technique introduced
The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect
Key Contributions
- FusionDetect: a novel AI-generated image detector fusing CLIP (semantic) and DINOv2 (structural/textural) frozen foundation model features to generalize across both generators and visual domains
- OmniGen Benchmark: a new evaluation dataset spanning 12 SOTA generators designed to test cross-generator and cross-visual-domain generalization simultaneously
- Two-axis generalization framework that extends the conventional cross-generator axis with a second, cross-visual-domain axis for more realistic detector evaluation
🛡️ Threat Analysis
FusionDetect is a novel AI-generated image detection architecture — directly addressing output integrity and content authenticity (detecting synthetic images from 12 SOTA generators). OmniGen Benchmark evaluates detectors across cross-generator and cross-visual-domain axes. Both contributions squarely target ML09: authenticating and verifying the provenance of AI-generated content.