benchmark 2025

Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

Xingyu Fu ^1,2, Siyi Liu ^1,2, Yinuo Xu ^1,2, Pan Lu ³, Guangqiuse Hu ^1,2, Tianbo Yang ^1,2, Taran Anantasagar ^1,2, Christopher Shen ^1,2, Yikai Mao ^1,2, Yuanzhe Liu ^1,2, Keyush Shah ^1,2, Chung Un Lee ^1,2, Yejin Choi ³, James Zou ³, Dan Roth ², Chris Callison-Burch ²

¹ Princeton University

² University of Pennsylvania

³ Stanford University

2 citations · 2 influential · 37 references · arXiv

Published on arXiv

2509.22646

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A 7B multimodal reward model trained on DeeptraceReward outperforms GPT-5 by 34.7% average across fake clue identification, spatial grounding, and temporal labeling tasks.

DeeptraceReward

Novel technique introduced

Can humans identify AI-generated (fake) videos and provide grounded reasons? While video generation models have advanced rapidly, a critical dimension -- whether humans can detect deepfake traces within a generated video, i.e., spatiotemporal grounded visual artifacts that reveal a video as machine generated -- has been largely overlooked. We introduce DeeptraceReward, the first fine-grained, spatially- and temporally- aware benchmark that annotates human-perceived fake traces for video generation reward. The dataset comprises 4.3K detailed annotations across 3.3K high-quality generated videos. Each annotation provides a natural-language explanation, pinpoints a bounding-box region containing the perceived trace, and marks precise onset and offset timestamps. We consolidate these annotations into 9 major categories of deepfake traces that lead humans to identify a video as AI-generated, and train multimodal language models (LMs) as reward models to mimic human judgments and localizations. On DeeptraceReward, our 7B reward model outperforms GPT-5 by 34.7% on average across fake clue identification, grounding, and explanation. Interestingly, we observe a consistent difficulty gradient: binary fake v.s. real classification is substantially easier than fine-grained deepfake trace detection; within the latter, performance degrades from natural language explanations (easiest), to spatial grounding, to temporal labeling (hardest). By foregrounding human-perceived deepfake traces, DeeptraceReward provides a rigorous testbed and training signal for socially aware and trustworthy video generation.

Key Contributions

DeeptraceReward: first fine-grained spatiotemporally grounded benchmark with 4.3K human annotations of deepfake traces across 3.3K AI-generated videos, including bounding boxes and onset/offset timestamps
Taxonomy of 9 major categories of deepfake traces that lead humans to identify AI-generated video
7B multimodal reward model trained on the benchmark that outperforms GPT-5 by 34.7% on fake clue identification, spatial grounding, and temporal labeling

🛡️ Threat Analysis

Output Integrity Attack

Paper directly addresses AI-generated video (deepfake) detection — a canonical ML09 task. It introduces a new benchmark (DeeptraceReward) with annotated deepfake trace categories, spatial bounding boxes, and temporal markers, and trains multimodal LMs as detection/reward models, constituting a novel forensic framework for AI-generated content, not merely applying an existing detector to a new domain.

Details

Domains

visionmultimodalgenerative

Model Types

vlmmultimodal

Threat Tags

inference_time

Datasets

DeeptraceReward (introduced in paper)

Applications

ai-generated video detectiondeepfake detectionvideo generation quality assessment

Read PDF arXiv DOI

Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Training-free Source Attribution of AI-generated Images via Resynthesis

Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models

The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

A Comprehensive Dataset for Human vs. AI Generated Image Detection