defense 2025

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

Chende Zheng 1, Ruiqi suo 1, Chenhao Lin 1, Zhengyu Zhao 1, Le Yang 1, Shuai Liu 1, Minghui Yang 2, Cong Wang 3, Chao Shen 1

0 citations

α

Published on arXiv

2508.00701

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

D3 outperforms the previous best method by 10.39% absolute mAP on GenVideo while requiring no training and remaining computationally efficient with strong robustness to post-processing operations.

D3 (Detection by Difference of Differences)

Novel technique introduced


The evolution of video generation techniques, such as Sora, has made it increasingly easy to produce high-fidelity AI-generated videos, raising public concern over the dissemination of synthetic content. However, existing detection methodologies remain limited by their insufficient exploration of temporal artifacts in synthetic videos. To bridge this gap, we establish a theoretical framework through second-order dynamical analysis under Newtonian mechanics, subsequently extending the Second-order Central Difference features tailored for temporal artifact detection. Building on this theoretical foundation, we reveal a fundamental divergence in second-order feature distributions between real and AI-generated videos. Concretely, we propose Detection by Difference of Differences (D3), a novel training-free detection method that leverages the above second-order temporal discrepancies. We validate the superiority of our D3 on 4 open-source datasets (Gen-Video, VideoPhy, EvalCrafter, VidProM), 40 subsets in total. For example, on GenVideo, D3 outperforms the previous best method by 10.39% (absolute) mean Average Precision. Additional experiments on time cost and post-processing operations demonstrate D3's exceptional computational efficiency and strong robust performance. Our code is available at https://github.com/Zig-HS/D3.


Key Contributions

  • Theoretical framework grounding AI-generated video detection in second-order dynamical analysis (Newtonian mechanics), revealing a fundamental divergence in second-order feature distributions between real and synthetic videos
  • D3 (Detection by Difference of Differences): a training-free detector that computes second-order central difference features over optical flow to classify videos without any model training or fine-tuning
  • Evaluation across 40 subsets of 4 open-source benchmarks (Gen-Video, VideoPhy, EvalCrafter, VidProM) achieving +10.39% absolute mAP over SOTA on GenVideo with strong robustness to post-processing

🛡️ Threat Analysis

Output Integrity Attack

D3 is a novel forensic technique specifically designed to detect AI-generated video content by exploiting second-order temporal artifacts. The paper's primary contribution is a new detection architecture (not a domain application of existing methods), grounded in a theoretical framework that reveals a fundamental distributional divergence between real and AI-generated video outputs — directly addressing output integrity and content authenticity.


Details

Domains
visiongenerative
Model Types
diffusiontransformer
Threat Tags
inference_time
Datasets
Gen-VideoVideoPhyEvalCrafterVidProM
Applications
ai-generated video detectiondeepfake video detectionsynthetic content authenticity verification