Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models

Recent advances in image generation models have led to models that produce synthetic images that are increasingly difficult for standard AI detectors to identify, even though they often remain distinguishable by humans. To identify this discrepancy, we introduce \textbf{Mirage}, a curated dataset comprising a diverse range of AI-generated images exhibiting visible artifacts, where current state-of-the-art detection methods largely fail. Furthermore, we investigate whether Large Vision-Language Models (LVLMs), which are increasingly employed as substitutes for human judgment in various tasks, can be leveraged for explainable AI image detection. Our experiments on both Mirage and existing benchmark datasets demonstrate that while LVLMs are highly effective at detecting AI-generated images with visible artifacts, their performance declines when confronted with images lacking such cues.

Key Contributions

Introduces Mirage, a curated dataset of 5,000 AI-generated images with visible artifacts (sourced from JourneyDB and DALL·E-3) specifically constructed because SOTA detectors largely fail on them
Proposes a taxonomy of nine artifact types in AI-generated images used to filter and rank images by artifact salience via Qwen-VL and CLIP similarity scoring
Demonstrates that LVLMs are highly effective at detecting AI-generated images with visible artifacts but degrade significantly on artifact-free synthetic images, revealing a fundamental gap in explainable detection

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection (output integrity and content authenticity) by curating a specialized benchmark dataset of synthetic images with visible artifacts and evaluating both standard detectors and LVLMs as forensic detection tools.