benchmark 2025

Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models

Pranav Sharma , Shivank Garg , Durga Toshniwal

0 citations · 35 references · arXiv

α

Published on arXiv

2510.03840

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

LVLMs substantially outperform standard AI detectors on images with visible generative artifacts but fail comparably when such cues are absent, exposing a reliance on perceptible artifacts for detection

Mirage

Novel technique introduced


Recent advances in image generation models have led to models that produce synthetic images that are increasingly difficult for standard AI detectors to identify, even though they often remain distinguishable by humans. To identify this discrepancy, we introduce \textbf{Mirage}, a curated dataset comprising a diverse range of AI-generated images exhibiting visible artifacts, where current state-of-the-art detection methods largely fail. Furthermore, we investigate whether Large Vision-Language Models (LVLMs), which are increasingly employed as substitutes for human judgment in various tasks, can be leveraged for explainable AI image detection. Our experiments on both Mirage and existing benchmark datasets demonstrate that while LVLMs are highly effective at detecting AI-generated images with visible artifacts, their performance declines when confronted with images lacking such cues.


Key Contributions

  • Introduces Mirage, a curated dataset of 5,000 AI-generated images with visible artifacts (sourced from JourneyDB and DALL·E-3) specifically constructed because SOTA detectors largely fail on them
  • Proposes a taxonomy of nine artifact types in AI-generated images used to filter and rank images by artifact salience via Qwen-VL and CLIP similarity scoring
  • Demonstrates that LVLMs are highly effective at detecting AI-generated images with visible artifacts but degrade significantly on artifact-free synthetic images, revealing a fundamental gap in explainable detection

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection (output integrity and content authenticity) by curating a specialized benchmark dataset of synthetic images with visible artifacts and evaluating both standard detectors and LVLMs as forensic detection tools.


Details

Domains
visionmultimodal
Model Types
vlmdiffusiontransformer
Threat Tags
inference_time
Datasets
MirageJourneyDBDALL-E 3
Applications
ai-generated image detectiondeepfake detectionimage forensics