defense 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan 1,2,3, Jun Lan 3, Zichang Tan 2, Ajian Liu 2, Chuanbiao Song 2, Senyuan Shi 3, Huijia Zhu 3, Weiqiang Wang 3, Jun Wan 1,2, Zhen Lei 1,2

0 citations

α

Published on arXiv

2508.21048

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Veritas achieves significant gains over SOTA detectors on cross-forgery and cross-domain OOD scenarios where prior detectors fall short, while delivering transparent chain-of-thought detection outputs.

Veritas (with P-GRPO and MiPO)

Novel technique introduced


Deepfake detection remains a formidable challenge due to the complex and evolving nature of fake content in real-world scenarios. However, existing academic benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical deployments of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing. Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical reasoning patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different OOD scenarios, and is capable of delivering transparent and faithful detection outputs.


Key Contributions

  • HydraFake dataset with hierarchical OOD evaluation protocol (cross-model, cross-forgery, cross-domain) that better simulates real-world industrial deepfake detection challenges
  • Pattern-aware reasoning framework for deepfake detection incorporating planning and self-reflection patterns inspired by human forensic processes
  • Two-stage training pipeline (MiPO cold-start + P-GRPO exploration) that grounds MLLM reasoning capabilities into generalizable deepfake detection

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel deepfake (AI-generated facial image) detection system with new forensic reasoning patterns — directly falls under AI-generated content detection and output integrity verification.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
HydraFakeFaceForensics++Celeb-DFDFDCWildDeepfake
Applications
deepfake detectionfacial image authenticationai-generated content detection