Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Deepfake detection remains a formidable challenge due to the complex and evolving nature of fake content in real-world scenarios. However, existing academic benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical deployments of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing. Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical reasoning patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different OOD scenarios, and is capable of delivering transparent and faithful detection outputs.

Key Contributions

HydraFake dataset with hierarchical OOD evaluation protocol (cross-model, cross-forgery, cross-domain) that better simulates real-world industrial deepfake detection challenges
Pattern-aware reasoning framework for deepfake detection incorporating planning and self-reflection patterns inspired by human forensic processes
Two-stage training pipeline (MiPO cold-start + P-GRPO exploration) that grounds MLLM reasoning capabilities into generalizable deepfake detection

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel deepfake (AI-generated facial image) detection system with new forensic reasoning patterns — directly falls under AI-generated content detection and output integrity verification.

Details

Domains

visionmultimodal

Model Types

vlmtransformer

Threat Tags

inference_time

Datasets

HydraFakeFaceForensics++Celeb-DFDFDCWildDeepfake

Applications

2026 0 cit.

Output Integrity Attack

92%