AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

Recent advances in AI-generated content have fueled the rise of highly realistic synthetic videos, posing severe risks to societal trust and digital integrity. Existing benchmarks for video authenticity detection typically suffer from limited realism, insufficient scale, and inadequate complexity, failing to effectively evaluate modern vision-language models against sophisticated forgeries. To address this critical gap, we introduce AEGIS, a novel large-scale benchmark explicitly targeting the detection of hyper-realistic and semantically nuanced AI-generated videos. AEGIS comprises over 10,000 rigorously curated real and synthetic videos generated by diverse, state-of-the-art generative models, including Stable Video Diffusion, CogVideoX-5B, KLing, and Sora, encompassing open-source and proprietary architectures. In particular, AEGIS features specially constructed challenging subsets enhanced with robustness evaluation. Furthermore, we provide multimodal annotations spanning Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features, facilitating authenticity detection and supporting downstream tasks such as multimodal fusion and forgery localization. Extensive experiments using advanced vision-language models demonstrate limited detection capabilities on the most challenging subsets of AEGIS, highlighting the dataset's unique complexity and realism beyond the current generalization capabilities of existing models. In essence, AEGIS establishes an indispensable evaluation benchmark, fundamentally advancing research toward developing genuinely robust, reliable, broadly generalizable video authenticity detection methodologies capable of addressing real-world forgery threats. Our dataset is available on https://huggingface.co/datasets/Clarifiedfish/AEGIS.

Key Contributions

AEGIS: a large-scale dataset of 10,000+ real and synthetic videos from diverse state-of-the-art generative models (Stable Video Diffusion, CogVideoX-5B, KLing, Sora) with challenging subsets using GPT-4o-refined prompts
Rich multimodal annotations including Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features supporting both detection and downstream tasks
Extensive evaluation demonstrating that current VLMs have severely limited detection capability on the most challenging AEGIS subsets, exposing a critical gap in generalizable video authenticity detection

🛡️ Threat Analysis

Output Integrity Attack

Directly targets AI-generated content detection — the benchmark is designed to evaluate the ability to distinguish synthetic from authentic videos, which is a core output integrity and content provenance task under ML09.

Details

Domains

visionmultimodalgenerative

Model Types

vlmdiffusiontransformer

Threat Tags

inference_time

Datasets

AEGISVBenchEvalCrafterAIGCBenchGenVidBenchDeMamba

Applications

2025 0 cit.

Output Integrity Attack

79%