DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization
Boyang Zhao 1, Xin Liao 1, Jiaxin Chen 2, Xiaoshuai Wu 1, Yufeng Wu 1
Published on arXiv
2601.01784
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
DDNet outperforms state-of-the-art temporal forgery localization methods by approximately 9% in AP@0.95 on ForgeryNet and TVIL benchmarks, with significant cross-domain robustness gains.
DDNet
Novel technique introduced
The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.
Key Contributions
- Dual-Stream Graph Learning (DSGL) module that integrates a Temporal Distance Stream for local artifacts and a Semantic Content Stream for global semantic reasoning to overcome the 'local view' limitation of existing TFL methods
- Cross-Level Feature Embedding (CLFE) that fuses high-level semantics from CLIP with low-level textures from ResNet for multi-grained forgery cue capture
- Trace Disentanglement and Adaptation (TDA) auxiliary module that isolates generic forgery fingerprints to improve cross-domain robustness
🛡️ Threat Analysis
Temporal forgery localization is AI-generated/manipulated content detection — the framework identifies and localizes tampered segments in videos, directly addressing output integrity and content authenticity. The paper proposes novel detection architecture (DSGL, CLFE, TDA), not merely applying existing methods.