benchmark 2025

Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

Xingyu Fu 1,2, Siyi Liu 1,2, Yinuo Xu 1,2, Pan Lu 3, Guangqiuse Hu 1,2, Tianbo Yang 1,2, Taran Anantasagar 1,2, Christopher Shen 1,2, Yikai Mao 1,2, Yuanzhe Liu 1,2, Keyush Shah 1,2, Chung Un Lee 1,2, Yejin Choi 3, James Zou 3, Dan Roth 2, Chris Callison-Burch 2

2 citations · 2 influential · 37 references · arXiv

α

Published on arXiv

2509.22646

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A 7B multimodal reward model trained on DeeptraceReward outperforms GPT-5 by 34.7% average across fake clue identification, spatial grounding, and temporal labeling tasks.

DeeptraceReward

Novel technique introduced


Can humans identify AI-generated (fake) videos and provide grounded reasons? While video generation models have advanced rapidly, a critical dimension -- whether humans can detect deepfake traces within a generated video, i.e., spatiotemporal grounded visual artifacts that reveal a video as machine generated -- has been largely overlooked. We introduce DeeptraceReward, the first fine-grained, spatially- and temporally- aware benchmark that annotates human-perceived fake traces for video generation reward. The dataset comprises 4.3K detailed annotations across 3.3K high-quality generated videos. Each annotation provides a natural-language explanation, pinpoints a bounding-box region containing the perceived trace, and marks precise onset and offset timestamps. We consolidate these annotations into 9 major categories of deepfake traces that lead humans to identify a video as AI-generated, and train multimodal language models (LMs) as reward models to mimic human judgments and localizations. On DeeptraceReward, our 7B reward model outperforms GPT-5 by 34.7% on average across fake clue identification, grounding, and explanation. Interestingly, we observe a consistent difficulty gradient: binary fake v.s. real classification is substantially easier than fine-grained deepfake trace detection; within the latter, performance degrades from natural language explanations (easiest), to spatial grounding, to temporal labeling (hardest). By foregrounding human-perceived deepfake traces, DeeptraceReward provides a rigorous testbed and training signal for socially aware and trustworthy video generation.


Key Contributions

  • DeeptraceReward: first fine-grained spatiotemporally grounded benchmark with 4.3K human annotations of deepfake traces across 3.3K AI-generated videos, including bounding boxes and onset/offset timestamps
  • Taxonomy of 9 major categories of deepfake traces that lead humans to identify AI-generated video
  • 7B multimodal reward model trained on the benchmark that outperforms GPT-5 by 34.7% on fake clue identification, spatial grounding, and temporal labeling

🛡️ Threat Analysis

Output Integrity Attack

Paper directly addresses AI-generated video (deepfake) detection — a canonical ML09 task. It introduces a new benchmark (DeeptraceReward) with annotated deepfake trace categories, spatial bounding boxes, and temporal markers, and trains multimodal LMs as detection/reward models, constituting a novel forensic framework for AI-generated content, not merely applying an existing detector to a new domain.


Details

Domains
visionmultimodalgenerative
Model Types
vlmmultimodal
Threat Tags
inference_time
Datasets
DeeptraceReward (introduced in paper)
Applications
ai-generated video detectiondeepfake detectionvideo generation quality assessment