benchmark 2026

Your One-Stop Solution for AI-Generated Video Detection

Long Ma 1, Zihao Xue 2, Yan Wang 3, Zhiyuan Yan 4, Jin Xu 1, Xiaorui Jiang 1, Haiyang Yu 1,5, Yong Liao 1, Zhen Bi 2

1 citations · 92 references · arXiv

α

Published on arXiv

2601.11035

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Comprehensive evaluation of 33 detectors reveals fundamental limitations in existing methods and identifies 4 novel findings to guide future AI-generated video detection research.

AIGVDBench

Novel technique introduced


Recent advances in generative modeling can create remarkably realistic synthetic videos, making it increasingly difficult for humans to distinguish them from real ones and necessitating reliable detection methods. However, two key limitations hinder the development of this field. \textbf{From the dataset perspective}, existing datasets are often limited in scale and constructed using outdated or narrowly scoped generative models, making it difficult to capture the diversity and rapid evolution of modern generative techniques. Moreover, the dataset construction process frequently prioritizes quantity over quality, neglecting essential aspects such as semantic diversity, scenario coverage, and technological representativeness. \textbf{From the benchmark perspective}, current benchmarks largely remain at the stage of dataset creation, leaving many fundamental issues and in-depth analysis yet to be systematically explored. Addressing this gap, we propose AIGVDBench, a benchmark designed to be comprehensive and representative, covering \textbf{31} state-of-the-art generation models and over \textbf{440,000} videos. By executing more than \textbf{1,500} evaluations on \textbf{33} existing detectors belonging to four distinct categories. This work presents \textbf{8 in-depth analyses} from multiple perspectives and identifies \textbf{4 novel findings} that offer valuable insights for future research. We hope this work provides a solid foundation for advancing the field of AI-generated video detection. Our benchmark is open-sourced at https://github.com/LongMa-2025/AIGVDBench.


Key Contributions

  • AIGVDBench: a large-scale benchmark covering 31 generative models and 440,000+ videos with emphasis on semantic diversity and technological representativeness
  • Systematic evaluation of 33 existing AI-generated video detectors across four categories via 1,500+ experimental runs
  • 8 in-depth analyses and 4 novel findings that characterize key failure modes and open challenges in AI-generated video detection

🛡️ Threat Analysis

Output Integrity Attack

The paper directly addresses AI-generated content detection — specifically detecting synthetic videos produced by generative models. This is squarely output integrity and content provenance. The benchmark evaluates 33 detectors, providing a systematic framework to assess the state of the art in deepfake/AI-video detection.


Details

Domains
visiongenerative
Model Types
diffusiongan
Threat Tags
inference_time
Datasets
AIGVDBench (440,000+ videos, 31 generative models)
Applications
ai-generated video detectiondeepfake detectionsynthetic media forensics