defense 2026

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

Binjia Zhou 1,2, Dawei Luo 2, Shuai Chen 2, Feng Xu 2, Seow 2, Haoyuan Li 1, Jiachi Wang 1, Jiawen Wang 2, Zunlei Feng 1, Yijun Bei 1

0 citations

α

Published on arXiv

2603.07515

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

EvolveReason outperforms state-of-the-art deepfake detection methods in identification accuracy while providing reliable, hallucination-reduced textual explanations of forgery traces.

EvolveReason

Novel technique introduced


With the rapid advancement of AIGC technology, developing identification methods to address the security challenges posed by deepfakes has become urgent. Face forgery identification techniques can be categorized into two types: traditional classification methods and explainable VLM approaches. The former provides classification results but lacks explanatory ability, while the latter, although capable of providing coarse-grained explanations, often suffers from hallucinations and insufficient detail. To overcome these limitations, we propose EvolveReason, which mimics the reasoning and observational processes of human auditors when identifying face forgeries. By constructing a chain-of-thought dataset, CoT-Face, tailored for advanced VLMs, our approach guides the model to think in a human-like way, prompting it to output reasoning processes and judgment results. This provides practitioners with reliable analysis and helps alleviate hallucination. Additionally, our framework incorporates a forgery latent-space distribution capture module, enabling EvolveReason to identify high-frequency forgery cues difficult to extract from the original images. To further enhance the reliability of textual explanations, we introduce a self-evolution exploration strategy, leveraging reinforcement learning to allow the model to iteratively explore and optimize its textual descriptions in a two-stage process. Experimental results show that EvolveReason not only outperforms the current state-of-the-art methods in identification performance but also accurately identifies forgery details and demonstrates generalization capabilities.


Key Contributions

  • EvolveReason framework that guides VLMs to reason in a human-auditor-like chain-of-thought manner for explainable deepfake facial image detection
  • CoT-Face dataset with 5,900+ samples containing multi-level forgery trace annotations (global to local) for training VLMs
  • Self-evolving reasoning strategy using reinforcement learning with a distribution consistency constraint to iteratively improve forgery identification accuracy and textual explanation reliability

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel deepfake detection framework for AI-generated facial images, directly targeting the authenticity and integrity of AI-generated visual content — canonical ML09 (AI-generated content detection).


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
CoT-FaceDD-VQA
Applications
deepfake detectionfacial image forgery identification