AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences
Jieyu Li 1,2, Xin Zhang 2, Joey Tianyi Zhou 2
Published on arXiv
2508.10771
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
State-of-the-art VLMs show limited detection accuracy on AEGIS's most challenging subsets, revealing that existing models fail to generalize against hyper-realistic AI-generated video forgeries.
AEGIS
Novel technique introduced
Recent advances in AI-generated content have fueled the rise of highly realistic synthetic videos, posing severe risks to societal trust and digital integrity. Existing benchmarks for video authenticity detection typically suffer from limited realism, insufficient scale, and inadequate complexity, failing to effectively evaluate modern vision-language models against sophisticated forgeries. To address this critical gap, we introduce AEGIS, a novel large-scale benchmark explicitly targeting the detection of hyper-realistic and semantically nuanced AI-generated videos. AEGIS comprises over 10,000 rigorously curated real and synthetic videos generated by diverse, state-of-the-art generative models, including Stable Video Diffusion, CogVideoX-5B, KLing, and Sora, encompassing open-source and proprietary architectures. In particular, AEGIS features specially constructed challenging subsets enhanced with robustness evaluation. Furthermore, we provide multimodal annotations spanning Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features, facilitating authenticity detection and supporting downstream tasks such as multimodal fusion and forgery localization. Extensive experiments using advanced vision-language models demonstrate limited detection capabilities on the most challenging subsets of AEGIS, highlighting the dataset's unique complexity and realism beyond the current generalization capabilities of existing models. In essence, AEGIS establishes an indispensable evaluation benchmark, fundamentally advancing research toward developing genuinely robust, reliable, broadly generalizable video authenticity detection methodologies capable of addressing real-world forgery threats. Our dataset is available on https://huggingface.co/datasets/Clarifiedfish/AEGIS.
Key Contributions
- AEGIS: a large-scale dataset of 10,000+ real and synthetic videos from diverse state-of-the-art generative models (Stable Video Diffusion, CogVideoX-5B, KLing, Sora) with challenging subsets using GPT-4o-refined prompts
- Rich multimodal annotations including Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features supporting both detection and downstream tasks
- Extensive evaluation demonstrating that current VLMs have severely limited detection capability on the most challenging AEGIS subsets, exposing a critical gap in generalizable video authenticity detection
🛡️ Threat Analysis
Directly targets AI-generated content detection — the benchmark is designed to evaluate the ability to distinguish synthetic from authentic videos, which is a core output integrity and content provenance task under ML09.