tool 2026

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

Chenyang Zhu 1,2, Maorong Wang 2, Jun Liu 2, Ching-Chun Chang 2, Isao Echizen 1,2

0 citations

α

Published on arXiv

2603.17343

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves SOTA accuracy on AIGI detection while enabling train-free integration of new detectors and reducing annotation costs

EvoGuard

Novel technique introduced


The rapid proliferation of AI-Generated Images (AIGIs) has introduced severe risks of misinformation, making AIGI detection a critical yet challenging task. While traditional detection paradigms mainly rely on low-level features, recent research increasingly focuses on leveraging the general understanding ability of Multimodal Large Language Models (MLLMs) to achieve better generalization, but still suffer from limited extensibility and expensive training data annotations. To better address complex and dynamic real-world environments, we propose EvoGuard, a novel agentic framework for AIGI detection. It encapsulates various state-of-the-art (SOTA) off-the-shelf MLLM and non-MLLM detectors as callable tools, and coordinates them through a capability-aware dynamic orchestration mechanism. Empowered by the agent's capacities for autonomous planning and reflection, it intelligently selects suitable tools for given samples, reflects intermediate results, and decides the next action, reaching a final conclusion through multi-turn invocation and reasoning. This design effectively exploits the complementary strengths among heterogeneous detectors, transcending the limits of any single model. Furthermore, optimized by a GRPO-based Agentic Reinforcement Learning algorithm using only low-cost binary labels, it eliminates the reliance on fine-grained annotations. Extensive experiments demonstrate that EvoGuard achieves SOTA accuracy while mitigating the bias between positive and negative samples. More importantly, it allows the plug-and-play integration of new detectors to boost overall performance in a train-free manner, offering a highly practical, long-term solution to ever-evolving AIGI threats. Source code will be publicly available upon acceptance.


Key Contributions

  • Agentic framework that orchestrates heterogeneous AIGI detectors (MLLM and non-MLLM) as callable tools with dynamic selection
  • GRPO-based reinforcement learning optimization using only binary labels, eliminating fine-grained annotation requirements
  • Plug-and-play extensibility allowing train-free integration of new detectors to adapt to evolving generative models

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses AI-generated image detection, which is a core ML09 task — verifying content authenticity and determining whether images are synthetic. The framework detects AI-generated content to establish provenance, directly falling under output integrity.


Details

Domains
visionmultimodalnlp
Model Types
multimodalllm
Threat Tags
inference_time
Applications
ai-generated image detectiondeepfake detectioncontent authenticity verification