tool 2025

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Kyoungjun Park 1, Yifan Yang 2, Juheon Yi 2, Shicheng Zheng 1, Yifei Shen 2, Dongqi Han 2, Caihua Shan 2, Muhammad Muaz 2, Lili Qiu 1,2

2 citations · 1 influential · 58 references · arXiv

α

Published on arXiv

2510.02282

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art zero-shot performance on existing benchmarks, with accuracy above 95% after additional training, alongside interpretable temporal-artifact rationales.

VidGuard-R1

Novel technique introduced


With the rapid advancement of AI-generated videos, there is an urgent need for effective detection tools to mitigate societal risks such as misinformation and reputational harm. In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure transparency for regulators and end users. To address these challenges, we introduce VidGuard-R1, the first video authenticity detector that fine-tunes a multi-modal large language model (MLLM) using group relative policy optimization (GRPO). Our model delivers both highly accurate judgments and insightful reasoning. We curate a challenging dataset of 140k real and AI-generated videos produced by state-of-the-art generation models, carefully designing the generation process to maximize discrimination difficulty. We then fine-tune Qwen-VL using GRPO with two specialized reward models that target temporal artifacts and generation complexity. Extensive experiments demonstrate that VidGuard-R1 achieves state-of-the-art zero-shot performance on existing benchmarks, with additional training pushing accuracy above 95%. Case studies further show that VidGuard-R1 produces precise and interpretable rationales behind its predictions. The code is publicly available at https://VidGuard-R1.github.io.


Key Contributions

  • First MLLM-based video authenticity detector fine-tuned with GRPO, yielding both accurate classification and interpretable reasoning
  • Two specialized reward models targeting temporal artifacts and generation complexity to guide RL-based fine-tuning
  • Curated challenging 140k video dataset from state-of-the-art generators, designed to maximize discrimination difficulty

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection (video deepfakes), proposing a novel detection architecture — MLLM fine-tuned with GRPO and specialized reward models targeting temporal artifacts and generation complexity — to authenticate video content provenance.


Details

Domains
visionmultimodalgenerative
Model Types
vlmmultimodaltransformerrl
Threat Tags
inference_time
Datasets
VidGuard-140k (curated)existing AI-generated video benchmarks
Applications
ai-generated video detectionvideo deepfake detectioncontent authenticity verification