Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
Rui Zuo 1, Qinyue Tong 1, Zhe-Ming Lu 1, Ziqian Lu 2
Published on arXiv
2511.13442
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Foresee outperforms existing MLLM-based IFDL methods in tamper localization accuracy and textual explanation richness across six forgery types without any additional training
Foresee
Novel technique introduced
With the rapid advancement of artificial intelligence-generated content (AIGC) technologies, including multimodal large language models (MLLMs) and diffusion models, image generation and manipulation have become remarkably effortless. Existing image forgery detection and localization (IFDL) methods often struggle to generalize across diverse datasets and offer limited interpretability. Nowadays, MLLMs demonstrate strong generalization potential across diverse vision-language tasks, and some studies introduce this capability to IFDL via large-scale training. However, such approaches cost considerable computational resources, while failing to reveal the inherent generalization potential of vanilla MLLMs to address this problem. Inspired by this observation, we propose Foresee, a training-free MLLM-based pipeline tailored for image forgery analysis. It eliminates the need for additional training and enables a lightweight inference process, while surpassing existing MLLM-based methods in both tamper localization accuracy and the richness of textual explanations. Foresee employs a type-prior-driven strategy and utilizes a Flexible Feature Detector (FFD) module to specifically handle copy-move manipulations, thereby effectively unleashing the potential of vanilla MLLMs in the forensic domain. Extensive experiments demonstrate that our approach simultaneously achieves superior localization accuracy and provides more comprehensive textual explanations. Moreover, Foresee exhibits stronger generalization capability, outperforming existing IFDL methods across various tampering types, including copy-move, splicing, removal, local enhancement, deepfake, and AIGC-based editing. The code will be released in the final version.
Key Contributions
- Training-free MLLM-based image forgery detection and localization pipeline (Foresee) eliminating costly fine-tuning while surpassing existing MLLM-based IFDL methods
- Flexible Feature Detector (FFD) module specifically designed to handle copy-move manipulations within vanilla MLLMs
- Type-prior-driven strategy that unlocks generalization of vanilla MLLMs across six forgery types: copy-move, splicing, removal, local enhancement, deepfake, and AIGC-based editing
🛡️ Threat Analysis
Proposes a novel AI-generated content detection and localization architecture (Foresee) for identifying deepfakes, AIGC-based edits, splicing, and copy-move forgeries — directly targeting output integrity and content provenance verification. The paper's primary contribution is a novel forensic detection technique, not merely applying existing detectors to a new domain.