DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization

The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.

Key Contributions

Dual-Stream Graph Learning (DSGL) module that integrates a Temporal Distance Stream for local artifacts and a Semantic Content Stream for global semantic reasoning to overcome the 'local view' limitation of existing TFL methods
Cross-Level Feature Embedding (CLFE) that fuses high-level semantics from CLIP with low-level textures from ResNet for multi-grained forgery cue capture
Trace Disentanglement and Adaptation (TDA) auxiliary module that isolates generic forgery fingerprints to improve cross-domain robustness

🛡️ Threat Analysis

Output Integrity Attack

Temporal forgery localization is AI-generated/manipulated content detection — the framework identifies and localizes tampered segments in videos, directly addressing output integrity and content authenticity. The paper proposes novel detection architecture (DSGL, CLFE, TDA), not merely applying existing methods.

Details

Domains

vision

Model Types

transformercnngnn

Threat Tags

inference_time

Datasets

ForgeryNetTVIL

Applications

2025 0 cit.

Output Integrity Attack

90%