tool 2026

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

0 citations · 68 references · arXiv (Cornell University)

Published on arXiv

2602.08828

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

VideoVeritas achieves more balanced detection performance across diverse benchmarks compared to existing methods that bias toward either superficial reasoning or mechanical artifact analysis

PPRL (Perception Pretext Reinforcement Learning)

Novel technique introduced

The growing capability of video generation poses escalating security risks, making reliable detection increasingly essential. In this paper, we introduce VideoVeritas, a framework that integrates fine-grained perception and fact-based reasoning. We observe that while current multi-modal large language models (MLLMs) exhibit strong reasoning capacity, their granular perception ability remains limited. To mitigate this, we introduce Joint Preference Alignment and Perception Pretext Reinforcement Learning (PPRL). Specifically, rather than directly optimizing for detection task, we adopt general spatiotemporal grounding and self-supervised object counting in the RL stage, enhancing detection performance with simple perception pretext tasks. To facilitate robust evaluation, we further introduce MintVid, a light yet high-quality dataset containing 3K videos from 9 state-of-the-art generators, along with a real-world collected subset that has factual errors in content. Experimental results demonstrate that existing methods tend to bias towards either superficial reasoning or mechanical analysis, while VideoVeritas achieves more balanced performance across diverse benchmarks.

Key Contributions

VideoVeritas framework combining Joint Preference Alignment and Perception Pretext Reinforcement Learning (PPRL) that uses spatiotemporal grounding and object counting pretext tasks to improve AIGC video detection without label-intensive annotation
MintVid dataset with 3K videos from 9 state-of-the-art generators plus a real-world subset with factual errors for more robust evaluation
Empirical demonstration that perception pretext RL tasks improve reasoning behavior and yield more balanced detection compared to methods biased toward superficial reasoning or mechanical analysis

🛡️ Threat Analysis

Output Integrity Attack

AI-generated video detection is explicitly covered under ML09 output integrity — specifically the 'AI-generated content detection (deepfake detection)' subcategory. The paper builds a detector to verify whether video content is real or synthetically generated.

Details

Domains

visionmultimodalnlp

Model Types

vlmllm

Threat Tags

inference_time

Datasets

MintVidGenVideoGenVidBench

Applications

ai-generated video detectiondeepfake video detection

Read PDF arXiv DOI Code

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics

Emergent Morphing Attack Detection in Open Multi-modal Large Language Models

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models

Towards Interactive Deepfake Analysis

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models