benchmark 2025

TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Jian-Yu Jiang-Lin ¹, Kang-Yang Huang ¹, Ling Zou ¹, Ling Lo ², Sheng-Ping Yang ¹, Yu-Wen Tseng ¹, Kun-Hsiang Lin ¹, Chia-Ling Chen ¹, Yu-Ting Ta ¹, Yanting Wang ¹, Po-Ching Chen ¹, Hongxia Xie ³, Hong-Han Shuai ², Wen-Huang Cheng ¹

¹ National Taiwan University

² National Yang Ming Chiao Tung University

³ Jilin University

0 citations · 148 references · arXiv

Published on arXiv

2512.10652

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Experiments on state-of-the-art MLLMs reveal that accurate perception is essential for reliable detection, but hallucination in model explanations can severely disrupt decision-making, highlighting the interdependence of these three evaluation dimensions.

TriDF

Novel technique introduced

Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecting such person-driven manipulations requires systems that not only distinguish altered content from authentic media but also provide clear and reliable reasoning. In this paper, we introduce TriDF, a comprehensive benchmark for interpretable DeepFake detection. TriDF contains high-quality forgeries from advanced synthesis models, covering 16 DeepFake types across image, video, and audio modalities. The benchmark evaluates three key aspects: Perception, which measures the ability of a model to identify fine-grained manipulation artifacts using human-annotated evidence; Detection, which assesses classification performance across diverse forgery families and generators; and Hallucination, which quantifies the reliability of model-generated explanations. Experiments on state-of-the-art multimodal large language models show that accurate perception is essential for reliable detection, but hallucination can severely disrupt decision-making, revealing the interdependence of these three aspects. TriDF provides a unified framework for understanding the interaction between detection accuracy, evidence identification, and explanation reliability, offering a foundation for building trustworthy systems that address real-world synthetic media threats.

Key Contributions

TriDF benchmark with 5K high-quality DeepFake samples spanning 16 manipulation types across image, video, and audio modalities
Three-dimensional evaluation framework measuring Perception (artifact identification), Detection (classification), and Hallucination (explanation reliability) for interpretable deepfake detectors
Empirical finding that hallucination in MLLM-generated explanations can severely disrupt detection decision-making even when perceptual ability is adequate

🛡️ Threat Analysis

Output Integrity Attack

TriDF is a benchmark specifically designed to evaluate AI-generated content detection (deepfakes across image, video, and audio), directly targeting output integrity and content authenticity — a canonical ML09 application.

Details

Domains

visionaudiomultimodalnlp

Model Types

vlmllmmultimodal

Threat Tags

inference_time

Datasets

TriDF (proposed, 5K samples, 16 DeepFake types)

Applications

deepfake detectioninterpretable ai content detectionmultimodal forgery detection

Read PDF arXiv DOI

TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

Emergent Morphing Attack Detection in Open Multi-modal Large Language Models

GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

UniMark: Artificial Intelligence Generated Content Identification Toolkit

Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism

Training-Free Multimodal Deepfake Detection via Graph Reasoning