tool 2026

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Yangxin Yu 1, Yue Zhou 1, Bin Li 1, Kaiqing Lin 1, Haodong Li 1, Jiangqun Ni 2, Bo Cao 3

0 citations

α

Published on arXiv

2603.23115

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Produces explainable forensic reports that integrate multiple detector outputs through LLM-guided reasoning, resolving conflicting judgments from specialized detectors

AgentFoX

Novel technique introduced


The increasing realism of AI-Generated Images (AIGI) has created an urgent need for forensic tools capable of reliably distinguishing synthetic content from authentic imagery. Existing detectors are typically tailored to specific forgery artifacts--such as frequency-domain patterns or semantic inconsistencies--leading to specialized performance and, at times, conflicting judgments. To address these limitations, we present \textbf{AgentFoX}, a Large Language Model-driven framework that redefines AIGI detection as a dynamic, multi-phase analytical process. Our approach employs a quick-integration fusion mechanism guided by a curated knowledge base comprising calibrated Expert Profiles and contextual Clustering Profiles. During inference, the agent begins with high-level semantic assessment, then transitions to fine-grained, context-aware synthesis of signal-level expert evidence, resolving contradictions through structured reasoning. Instead of returning a coarse binary output, AgentFoX produces a detailed, human-readable forensic report that substantiates its verdict, enhancing interpretability and trustworthiness for real-world deployment. Beyond providing a novel detection solution, this work introduces a scalable agentic paradigm that facilitates intelligent integration of future and evolving forensic tools.


Key Contributions

  • LLM-agent framework that orchestrates multiple specialized AIGI detectors through semantic reasoning and evidence synthesis
  • Quick-integration fusion mechanism guided by Expert Profiles and Clustering Profiles for context-aware detector combination
  • Human-readable forensic reports with structured reasoning chains that explain detection verdicts and resolve conflicting expert judgments

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is detecting AI-generated images (synthetic content detection) and verifying content authenticity through forensic analysis. The framework integrates multiple forgery detection methods to determine if images are AI-generated, which is quintessential output integrity verification.


Details

Domains
visionmultimodalnlp
Model Types
llmmultimodalgandiffusion
Threat Tags
inference_time
Applications
ai-generated image detectiondeepfake detectiondigital forensicscontent authenticity verification