defense 2026

Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Omer Ben Hayun , Roy Betser , Meir Yossef Levi , Levi Kassel , Guy Gilboa

0 citations

α

Published on arXiv

2603.15026

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

STALL consistently outperforms prior image-based and video-based baselines on two public benchmarks and the new ComGenVid benchmark in zero-shot setting

STALL

Novel technique introduced


Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce \emph{STALL}, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.


Key Contributions

  • Zero-shot, training-free video detector (STALL) that jointly models spatial and temporal evidence in a probabilistic framework
  • Likelihood-based scoring against real-data statistics, enabling model-agnostic detection without requiring synthetic training data
  • Introduction of ComGenVid benchmark with state-of-the-art generative video models

🛡️ Threat Analysis

Output Integrity Attack

Proposes a detection method for AI-generated videos, addressing output integrity and content authenticity. This is deepfake/synthetic video detection, which falls under verifying and authenticating model outputs and AI-generated content provenance.


Details

Domains
visionmultimodal
Model Types
diffusiongangenerative
Threat Tags
inference_timeblack_box
Datasets
ComGenVid
Applications
deepfake detectionsynthetic video detectionmisinformation prevention