benchmark 2025

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

Jieyu Li 1,2, Xin Zhang 2, Joey Tianyi Zhou 2

0 citations

α

Published on arXiv

2508.10771

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

State-of-the-art VLMs show limited detection accuracy on AEGIS's most challenging subsets, revealing that existing models fail to generalize against hyper-realistic AI-generated video forgeries.

AEGIS

Novel technique introduced


Recent advances in AI-generated content have fueled the rise of highly realistic synthetic videos, posing severe risks to societal trust and digital integrity. Existing benchmarks for video authenticity detection typically suffer from limited realism, insufficient scale, and inadequate complexity, failing to effectively evaluate modern vision-language models against sophisticated forgeries. To address this critical gap, we introduce AEGIS, a novel large-scale benchmark explicitly targeting the detection of hyper-realistic and semantically nuanced AI-generated videos. AEGIS comprises over 10,000 rigorously curated real and synthetic videos generated by diverse, state-of-the-art generative models, including Stable Video Diffusion, CogVideoX-5B, KLing, and Sora, encompassing open-source and proprietary architectures. In particular, AEGIS features specially constructed challenging subsets enhanced with robustness evaluation. Furthermore, we provide multimodal annotations spanning Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features, facilitating authenticity detection and supporting downstream tasks such as multimodal fusion and forgery localization. Extensive experiments using advanced vision-language models demonstrate limited detection capabilities on the most challenging subsets of AEGIS, highlighting the dataset's unique complexity and realism beyond the current generalization capabilities of existing models. In essence, AEGIS establishes an indispensable evaluation benchmark, fundamentally advancing research toward developing genuinely robust, reliable, broadly generalizable video authenticity detection methodologies capable of addressing real-world forgery threats. Our dataset is available on https://huggingface.co/datasets/Clarifiedfish/AEGIS.


Key Contributions

  • AEGIS: a large-scale dataset of 10,000+ real and synthetic videos from diverse state-of-the-art generative models (Stable Video Diffusion, CogVideoX-5B, KLing, Sora) with challenging subsets using GPT-4o-refined prompts
  • Rich multimodal annotations including Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features supporting both detection and downstream tasks
  • Extensive evaluation demonstrating that current VLMs have severely limited detection capability on the most challenging AEGIS subsets, exposing a critical gap in generalizable video authenticity detection

🛡️ Threat Analysis

Output Integrity Attack

Directly targets AI-generated content detection — the benchmark is designed to evaluate the ability to distinguish synthetic from authentic videos, which is a core output integrity and content provenance task under ML09.


Details

Domains
visionmultimodalgenerative
Model Types
vlmdiffusiontransformer
Threat Tags
inference_time
Datasets
AEGISVBenchEvalCrafterAIGCBenchGenVidBenchDeMamba
Applications
ai-generated video detectiondeepfake video detectionvideo authenticity verification