defense 2026

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

Yue Zhou 1, Xinan He 2,1, Kaiqing Lin 1, Bing Fan 3, Feng Ding 2, Bin Li 1

0 citations · 53 references · arXiv (Cornell University)

α

Published on arXiv

2602.01738

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A linear classifier on frozen VFM features boosts in-the-wild AIGI detection accuracy by over 30% compared to specialized detectors, while matching them on standard curated benchmarks.

VFM Linear Probe (frozen features + linear classifier)

Novel technique introduced


While specialized detectors for AI-Generated Images (AIGI) achieve near-perfect accuracy on curated benchmarks, they suffer from a dramatic performance collapse in realistic, in-the-wild scenarios. In this work, we demonstrate that simplicity prevails over complex architectural designs. A simple linear classifier trained on the frozen features of modern Vision Foundation Models , including Perception Encoder, MetaCLIP 2, and DINOv3, establishes a new state-of-the-art. Through a comprehensive evaluation spanning traditional benchmarks, unseen generators, and challenging in-the-wild distributions, we show that this baseline not only matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets, boosting accuracy by striking margins of over 30\%. We posit that this superior capability is an emergent property driven by the massive scale of pre-training data containing synthetic content. We trace the source of this capability to two distinct manifestations of data exposure: Vision-Language Models internalize an explicit semantic concept of forgery, while Self-Supervised Learning models implicitly acquire discriminative forensic features from the pretraining data. However, we also reveal persistent limitations: these models suffer from performance degradation under recapture and transmission, remain blind to VAE reconstruction and localized editing. We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.


Key Contributions

  • Demonstrates that a simple linear classifier on frozen Vision Foundation Model features (Perception Encoder, MetaCLIP 2, DINOv3) achieves state-of-the-art AIGI detection, outperforming specialized detectors by over 30% on in-the-wild benchmarks.
  • Traces the emergent detection capability to two mechanisms: explicit semantic concept injection in VLMs via web-scale co-training with synthetic content, and implicit low-level forensic feature acquisition in SSL models like DINOv3.
  • Identifies persistent failure modes of VFM-based detectors, including degradation under recapture/transmission, and blindness to VAE reconstruction and localized image editing.

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is detecting AI-generated images (AIGI) — a canonical output integrity and content authenticity problem. The paper proposes a new detection paradigm (frozen VFM features + linear probe) and evaluates it comprehensively across standard and in-the-wild benchmarks, directly advancing the AI-generated content detection subfield of ML09.


Details

Domains
vision
Model Types
transformervlm
Threat Tags
inference_time
Datasets
GenImageAIGIHolmesAIGI-NowChameleonWildRF
Applications
ai-generated image detectiondeepfake detectionmedia forensics