benchmark 2026

SynthForensics: A Multi-Generator Benchmark for Detecting Synthetic Video Deepfakes

Roberto Leotta 1, Salvatore Alfio Sambataro 2, Claudio Vittorio Ragaglia 2, Mirko Casu 2, Yuri Petralia 1, Francesco Guarnera 2, Luca Guarnera 2, Sebastiano Battiato 2

0 citations · 57 references · arXiv (Cornell University)

α

Published on arXiv

2602.04939

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

State-of-the-art deepfake detectors show a mean AUC drop of 29.19% on T2V synthetic videos, with some performing worse than random chance; training on SynthForensics recovers generalization to 93.81% AUC on unseen generators.

SynthForensics

Novel technique introduced


The landscape of synthetic media has been irrevocably altered by text-to-video (T2V) models, whose outputs are rapidly approaching indistinguishability from reality. Critically, this technology is no longer confined to large-scale labs; the proliferation of efficient, open-source generators is democratizing the ability to create high-fidelity synthetic content on consumer-grade hardware. This makes existing face-centric and manipulation-based benchmarks obsolete. To address this urgent threat, we introduce SynthForensics, to the best of our knowledge the first human-centric benchmark for detecting purely synthetic video deepfakes. The benchmark comprises 6,815 unique videos from five architecturally distinct, state-of-the-art open-source T2V models. Its construction was underpinned by a meticulous two-stage, human-in-the-loop validation to ensure high semantic and visual quality. Each video is provided in four versions (raw, lossless, light, and heavy compression) to enable real-world robustness testing. Experiments demonstrate that state-of-the-art detectors are both fragile and exhibit limited generalization when evaluated on this new domain: we observe a mean performance drop of $29.19\%$ AUC, with some methods performing worse than random chance, and top models losing over 30 points under heavy compression. The paper further investigates the efficacy of training on SynthForensics as a means to mitigate these observed performance gaps, achieving robust generalization to unseen generators ($93.81\%$ AUC), though at the cost of reduced backward compatibility with traditional manipulation-based deepfakes. The complete dataset and all generation metadata, including the specific prompts and inference parameters for every video, will be made publicly available at [link anonymized for review].


Key Contributions

  • SynthForensics: the first human-centric benchmark for purely synthetic video deepfake detection, comprising 6,815 videos from five distinct open-source T2V models with four compression variants each
  • Two-stage human-in-the-loop validation pipeline ensuring high semantic and visual quality of synthetic videos paired with real source videos
  • Empirical evaluation revealing state-of-the-art detectors suffer a mean AUC drop of 29.19% on synthetic T2V deepfakes, with training on SynthForensics achieving 93.81% AUC generalization to unseen generators

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection — specifically detecting purely synthetic video deepfakes produced by text-to-video models. The benchmark evaluates content authenticity and the ability to distinguish real from AI-generated video, which is output integrity and provenance verification. The paper also proposes training strategies to improve detector generalization across unseen generators.


Details

Domains
visiongenerative
Model Types
diffusiontransformer
Threat Tags
inference_timedigital
Datasets
FaceForensics++ (FF++)DeepFakeDetection (DFD)SynthForensics
Applications
video deepfake detectionsynthetic media forensicscontent authenticity