defense 2025

DBINDS -- Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos?

Yanlin Wu , Xiaogang Yuan , Dezhi An

0 citations · arXiv

α

Published on arXiv

2511.09184

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Trained on a single generator, DBINDS achieves 78.08% overall accuracy across seven unseen generators and one unseen real-video set, demonstrating robust cross-model transferability in limited-data settings.

DBINDS (Detection Based on Initial Noise Difference Sequence)

Novel technique introduced


AI-generated video has advanced rapidly and poses serious challenges to content security and forensic analysis. Existing detectors rely mainly on pixel-level visual cues and generalize poorly to unseen generators. We propose DBINDS, a diffusion-model-inversion based detector that analyzes latent-space dynamics rather than pixels. We find that initial noise sequences recovered by diffusion inversion differ systematically between real and generated videos. Building on this, DBINDS forms an Initial Noise Difference Sequence (INDS) and extracts multi-domain, multi-scale features. With feature optimization and a LightGBM classifier tuned by Bayesian search, DBINDS (trained on a single generator) achieves strong cross-generator performance on GenVidBench, demonstrating good generalization and robustness in limited-data settings.


Key Contributions

  • Introduces DBINDS, the first AI-generated video detector operating in diffusion-model latent space rather than pixel space, using diffusion model inversion to recover initial noise sequences
  • Proposes the Initial Noise Difference Sequence (INDS) as a discriminative feature representation capturing systematic differences between real and generated video latent dynamics
  • Demonstrates strong cross-generator zero-shot generalization (78.08% accuracy across 7 unseen generators) when trained on a single generator using LightGBM with Bayesian hyperparameter optimization

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel forensic detection architecture for AI-generated video content — identifying synthetic videos is a core ML09 (output integrity / AI-generated content detection) contribution. The paper introduces a novel technique (INDS-based latent-space analysis) rather than merely applying existing detectors to a new domain.


Details

Domains
visiongenerative
Model Types
diffusiontraditional_ml
Threat Tags
inference_timedigital
Datasets
GenVidBenchGenVid
Applications
ai-generated video detectionvideo forensicsdeepfake detection