tool 2025

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Yifei Li , Wenzhao Zheng , Yanran Zhang , Runze Sun , Yu Zheng , Lei Chen , Jie Zhou , Jiwen Lu

4 citations · 3 influential · 90 references · arXiv

α

Published on arXiv

2512.15693

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Skyra surpasses existing AI-generated video detection methods across multiple benchmarks while providing human-interpretable artifact-grounded explanations for its predictions.

Skyra

Novel technique introduced


The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing methods are limited to binary classification and lack the necessary explanations for human interpretation. In this paper, we present Skyra, a specialized multimodal large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos and leverages them as grounded evidence for both detection and explanation. To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video artifact dataset with fine-grained human annotations. We then develop a two-stage training strategy that systematically enhances our model's spatio-temporal artifact perception, explanation capability, and detection accuracy. To comprehensively evaluate Skyra, we introduce ViF-Bench, a benchmark comprising 3K high-quality samples generated by over ten state-of-the-art video generators. Extensive experiments demonstrate that Skyra surpasses existing methods across multiple benchmarks, while our evaluation yields valuable insights for advancing explainable AI-generated video detection.


Key Contributions

  • Skyra: a specialized MLLM that detects AI-generated videos by identifying human-perceivable visual artifacts and using them as grounded, explainable evidence for binary classification
  • ViF-CoT-4K: the first large-scale AI-generated video artifact dataset with fine-grained human annotations for supervised fine-tuning
  • ViF-Bench: a benchmark of 3K high-quality samples from 10+ state-of-the-art video generators for comprehensive evaluation of explainable AI-generated video detection

🛡️ Threat Analysis

Output Integrity Attack

Skyra is an AI-generated content detection system that identifies visual artifacts in synthetic videos to authenticate content provenance — directly targeting output integrity. The paper contributes a novel detection architecture (not just a domain application), a new training dataset (ViF-CoT-4K), and a new benchmark (ViF-Bench), all focused on detecting AI-generated video content.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
ViF-CoT-4KViF-Bench
Applications
ai-generated video detectiondeepfake video detectionvideo content authentication