Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect

Key Contributions

FusionDetect: a novel AI-generated image detector fusing CLIP (semantic) and DINOv2 (structural/textural) frozen foundation model features to generalize across both generators and visual domains
OmniGen Benchmark: a new evaluation dataset spanning 12 SOTA generators designed to test cross-generator and cross-visual-domain generalization simultaneously
Two-axis generalization framework that extends the conventional cross-generator axis with a second, cross-visual-domain axis for more realistic detector evaluation

🛡️ Threat Analysis

Output Integrity Attack

FusionDetect is a novel AI-generated image detection architecture — directly addressing output integrity and content authenticity (detecting synthetic images from 12 SOTA generators). OmniGen Benchmark evaluates detectors across cross-generator and cross-visual-domain axes. Both contributions squarely target ML09: authenticating and verifying the provenance of AI-generated content.