From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection

The rapid evolution of AI-generated images poses unprecedented challenges to information integrity and media authenticity. Existing detection approaches suffer from fundamental limitations: traditional classifiers lack interpretability and fail to generalize across evolving generative models, while vision-language models (VLMs), despite their promise, remain constrained to single-shot analysis and pixel-level reasoning. To address these challenges, we introduce AIFo (Agent-based Image Forensics), a novel training-free framework that emulates human forensic investigation through multi-agent collaboration. Unlike conventional methods, our framework employs a set of forensic tools, including reverse image search, metadata extraction, pre-trained classifiers, and VLM analysis, coordinated by specialized LLM-based agents that collect, synthesize, and reason over cross-source evidence. When evidence is conflicting or insufficient, a structured multi-agent debate mechanism allows agents to exchange arguments and reach a reliable conclusion. Furthermore, we enhance the framework with a memory-augmented reasoning module that learns from historical cases to improve future detection accuracy. Our comprehensive evaluation spans 6,000 images across both controlled laboratory settings and challenging real-world scenarios, including images from modern generative platforms and diverse online sources. AIFo achieves 97.05% accuracy, substantially outperforming traditional classifiers and state-of-the-art VLMs. These results demonstrate that agent-based procedural reasoning offers a new paradigm for more robust, interpretable, and adaptable AI-generated image detection.

Key Contributions

AIFo: a training-free multi-agent framework that orchestrates forensic tools (reverse image search, EXIF metadata, pre-trained classifiers, VLM analysis) via specialized LLM-based agents for AI-generated image detection
Structured multi-agent debate mechanism that resolves conflicting or insufficient evidence to produce reliable verdicts
Memory-augmented reasoning module that learns from historical detection cases to improve accuracy on future inputs

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated image detection framework (AIFo) — directly addresses output integrity by verifying whether images are AI-generated. The paper's primary contribution is a novel forensic detection architecture combining multi-source evidence, agent debate, and memory-augmented reasoning, placing it squarely in the AI-generated content detection sub-category of ML09.