defense 2025

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

6 citations · 1 influential · arXiv

Published on arXiv

2510.08073

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score on AI-generated video detection benchmarks

NSG-VD

Novel technique introduced

AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose a physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at https://github.com/ZSHsh98/NSG-VD.

Key Contributions

Normalized Spatiotemporal Gradient (NSG) statistic grounded in probability flow conservation principles that quantifies deviations from natural video spatiotemporal dynamics
NSG estimator leveraging pre-trained diffusion model score functions for spatial gradient approximation and brightness constancy for temporal modeling, avoiding explicit optical flow computation
NSG-VD detection method using Maximum Mean Discrepancy between NSG features of test and real videos, with theoretical upper bound proving generated videos exhibit amplified discrepancies

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel AI-generated content detection method (NSG-VD) for synthetic videos. The paper introduces a new forensic technique grounded in probability flow conservation physics, leveraging diffusion model score functions to detect deepfake/synthetic videos — directly addressing output integrity and content authenticity.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_time

Applications

ai-generated video detectiondeepfake video detection

Read PDF arXiv DOI Code

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ShapeMark: Robust and Diversity-Preserving Watermarking for Diffusion Models

Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

ALIEN: Analytic Latent Watermarking for Controllable Generation

I2VWM: Robust Watermarking for Image to Video Generation

A Difference-in-Difference Approach to Detecting AI-Generated Images