tool 2025

MIRAGE: Towards AI-Generated Image Detection in the Wild

Cheng Xia , Manxi Lin , Jiexiang Tan , Xiaoxiong Du , Yang Qiu , Junjun Zheng , Xiangheng Kong , Yuning Jiang , Bo Zheng

0 citations

α

Published on arXiv

2508.13223

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Mirage-R1 outperforms state-of-the-art detectors by 5% on the Mirage benchmark and 10% on public benchmarks

Mirage-R1

Novel technique introduced


The spreading of AI-generated images (AIGI), driven by advances in generative AI, poses a significant threat to information security and public trust. Existing AIGI detectors, while effective against images in clean laboratory settings, fail to generalize to in-the-wild scenarios. These real-world images are noisy, varying from ``obviously fake" images to realistic ones derived from multiple generative models and further edited for quality control. We address in-the-wild AIGI detection in this paper. We introduce Mirage, a challenging benchmark designed to emulate the complexity of in-the-wild AIGI. Mirage is constructed from two sources: (1) a large corpus of Internet-sourced AIGI verified by human experts, and (2) a synthesized dataset created through the collaboration between multiple expert generators, closely simulating the realistic AIGI in the wild. Building on this benchmark, we propose Mirage-R1, a vision-language model with heuristic-to-analytic reasoning, a reflective reasoning mechanism for AIGI detection. Mirage-R1 is trained in two stages: a supervised-fine-tuning cold start, followed by a reinforcement learning stage. By further adopting an inference-time adaptive thinking strategy, Mirage-R1 is able to provide either a quick judgment or a more robust and accurate conclusion, effectively balancing inference speed and performance. Extensive experiments show that our model leads state-of-the-art detectors by 5% and 10% on Mirage and the public benchmark, respectively. The benchmark and code will be made publicly available.


Key Contributions

  • Mirage benchmark: first dataset focused on in-the-wild AIGI detection, combining internet-sourced human-curated fakes with composite multi-model pipeline-generated photorealistic images
  • Mirage-R1: a VLM with heuristic-to-analytic reasoning trained via SFT cold start followed by RL, supporting reflective self-correction of initial impressions
  • Inference-time adaptive thinking strategy that selects between fast judgment and deliberate chain-of-thought based on model confidence, balancing speed and accuracy

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection — determining whether images are AI-generated is an output integrity/content provenance problem explicitly listed under ML09. Mirage-R1 is a novel detection architecture (not merely applying existing methods), using RL training, heuristic-to-analytic reasoning, and adaptive inference.


Details

Domains
vision
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
MirageGenImageCNNSpot
Applications
ai-generated image detectiondeepfake detectioncontent moderationinformation security