defense 2026

DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization

Boyang Zhao 1, Xin Liao 1, Jiaxin Chen 2, Xiaoshuai Wu 1, Yufeng Wu 1

0 citations · arXiv

α

Published on arXiv

2601.01784

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DDNet outperforms state-of-the-art temporal forgery localization methods by approximately 9% in AP@0.95 on ForgeryNet and TVIL benchmarks, with significant cross-domain robustness gains.

DDNet

Novel technique introduced


The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.


Key Contributions

  • Dual-Stream Graph Learning (DSGL) module that integrates a Temporal Distance Stream for local artifacts and a Semantic Content Stream for global semantic reasoning to overcome the 'local view' limitation of existing TFL methods
  • Cross-Level Feature Embedding (CLFE) that fuses high-level semantics from CLIP with low-level textures from ResNet for multi-grained forgery cue capture
  • Trace Disentanglement and Adaptation (TDA) auxiliary module that isolates generic forgery fingerprints to improve cross-domain robustness

🛡️ Threat Analysis

Output Integrity Attack

Temporal forgery localization is AI-generated/manipulated content detection — the framework identifies and localizes tampered segments in videos, directly addressing output integrity and content authenticity. The paper proposes novel detection architecture (DSGL, CLFE, TDA), not merely applying existing methods.


Details

Domains
vision
Model Types
transformercnngnn
Threat Tags
inference_time
Datasets
ForgeryNetTVIL
Applications
video forgery detectiontemporal forgery localizationvideo forensics