benchmark 2025

Brought a Gun to a Knife Fight: Modern VFM Baselines Outgun Specialized Detectors on In-the-Wild AI Image Detection

Yue Zhou 1, Xinan He 1,2, Kaiqing Lin 1, Bing Fan 3, Feng Ding 2, Jinhua Zeng 4, Bin Li 1

0 citations

α

Published on arXiv

2509.12995

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A linear probe on the Perception Encoder VFM achieves 96.1% average accuracy on the Chameleon in-the-wild benchmark, outperforming the best specialized detector (DDA, 74.3%) by over 20 percentage points.


While specialized detectors for AI-generated images excel on curated benchmarks, they fail catastrophically in real-world scenarios, as evidenced by their critically high false-negative rates on `in-the-wild' benchmarks. Instead of crafting another specialized `knife' for this problem, we bring a `gun' to the fight: a simple linear classifier on a modern Vision Foundation Model (VFM). Trained on identical data, this baseline decisively `outguns' bespoke detectors, boosting in-the-wild accuracy by a striking margin of over 20\%. Our analysis pinpoints the source of the VFM's `firepower': First, by probing text-image similarities, we find that recent VLMs (e.g., Perception Encoder, Meta CLIP2) have learned to align synthetic images with forgery-related concepts (e.g., `AI-generated'), unlike previous versions. Second, we speculate that this is due to data exposure, as both this alignment and overall accuracy plummet on a novel dataset scraped after the VFM's pre-training cut-off date, ensuring it was unseen during pre-training. Our findings yield two critical conclusions: 1) For the real-world `gunfight' of AI-generated image detection, the raw `firepower' of an updated VFM is far more effective than the `craftsmanship' of a static detector. 2) True generalization evaluation requires test data to be independent of the model's entire training history, including pre-training.


Key Contributions

  • Shows that a simple linear classifier on a modern VFM (PE, MetaCLIP-2, DINOv3) outperforms state-of-the-art specialized forensic detectors by over 20% on in-the-wild benchmarks
  • Identifies that recent VLMs have implicitly learned to align synthetic images with forgery-related concepts (e.g., 'AI-generated'), explaining their superior detection performance
  • Introduces a verifiably unseen evaluation dataset composed of post-cutoff synthetic images and private photographs to enable true generalization assessment

🛡️ Threat Analysis

Output Integrity Attack

The paper directly addresses AI-generated image detection — a core output integrity / content provenance problem. It evaluates and improves detection of synthetic images (deepfakes, diffusion/GAN outputs) in the wild, introduces a verifiably unseen evaluation dataset, and analyzes why modern VFMs encode forgery-related semantics better than specialized forensic detectors.


Details

Domains
vision
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
GenImageChameleon
Applications
ai-generated image detectionmultimedia forensics