Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism
Yinjie Zhao 1,2, Heng Zhao 1, Bihan Wen 1,2, Joey Tianyi Zhou 1
Published on arXiv
2511.17672
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Inception achieves SOTA performance on the AEGIS benchmark with a large margin of improvement over the strongest existing LLM baselines for AIGC visual deception detection.
Inception
Novel technique introduced
As the development of AI-generated contents (AIGC), multi-modal Large Language Models (LLM) struggle to identify generated visual inputs from real ones. Such shortcoming causes vulnerability against visual deceptions, where the models are deceived by generated contents, and the reliability of reasoning processes is jeopardized. Therefore, facing rapidly emerging generative models and diverse data distribution, it is of vital importance to improve LLMs' generalizable reasoning to verify the authenticity of visual inputs against potential deceptions. Inspired by human cognitive processes, we discovered that LLMs exhibit tendency of over-trusting the visual inputs, while injecting skepticism could significantly improve the models visual cognitive capability against visual deceptions. Based on this discovery, we propose \textbf{Inception}, a fully reasoning-based agentic reasoning framework to conduct generalizable authenticity verification by injecting skepticism, where LLMs' reasoning logic is iteratively enhanced between External Skeptic and Internal Skeptic agents. To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions. Our approach achieved a large margin of performance improvement over the strongest existing LLM baselines and SOTA performance on AEGIS benchmark.
Key Contributions
- Discovery that LLMs over-trust visual inputs and that injecting skepticism significantly improves authenticity verification capability
- Inception: a fully reasoning-based agentic framework with External Skeptic and Internal Skeptic agents that iteratively refine reasoning logic for AIGC detection
- First fully reasoning-based framework against AIGC visual deceptions, achieving SOTA on the AEGIS benchmark
🛡️ Threat Analysis
The paper directly addresses AI-generated content detection — specifically, VLMs being deceived by AIGC visual inputs. The primary contribution is a novel detection architecture (Inception) for verifying the authenticity of visual inputs, which falls squarely under ML09's coverage of AI-generated content detection (deepfake/synthetic image detection).