defense 2026

DeformTrace: A Deformable State Space Model with Relay Tokens for Temporal Forgery Localization

Xiaodong Zhu , Suting Wang , Yuanming Zheng , Junqi Yang , Yangxu Liao , Yuhong Yang , Weiping Tu , Zhongyuan Wang

0 citations

α

Published on arXiv

2603.04882

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art temporal forgery localization performance with fewer parameters and faster inference than prior CNN and Transformer-based methods

DeformTrace

Novel technique introduced


Temporal Forgery Localization (TFL) aims to precisely identify manipulated segments in video and audio, offering strong interpretability for security and forensics. While recent State Space Models (SSMs) show promise in precise temporal reasoning, their use in TFL is hindered by ambiguous boundaries, sparse forgeries, and limited long-range modeling. We propose DeformTrace, which enhances SSMs with deformable dynamics and relay mechanisms to address these challenges. Specifically, Deformable Self-SSM (DS-SSM) introduces dynamic receptive fields into SSMs for precise temporal localization. To further enhance its capacity for temporal reasoning and mitigate long-range decay, a Relay Token Mechanism is integrated into DS-SSM. Besides, Deformable Cross-SSM (DC-SSM) partitions the global state space into query-specific subspaces, reducing non-forgery information accumulation and boosting sensitivity to sparse forgeries. These components are integrated into a hybrid architecture that combines the global modeling of Transformers with the efficiency of SSMs. Extensive experiments show that DeformTrace achieves state-of-the-art performance with fewer parameters, faster inference, and stronger robustness.


Key Contributions

  • Deformable Self-SSM (DS-SSM) that introduces dynamic receptive fields into temporal state space models for precise forgery boundary localization
  • Relay Token Mechanism periodically inserted into SSMs to mitigate long-range information decay and expand effective receptive fields
  • Deformable Cross-SSM (DC-SSM) enabling cross-sequence interactions that partition the global state space into query-specific subspaces to improve sensitivity to sparse forgeries

🛡️ Threat Analysis

Output Integrity Attack

Proposes a new detection architecture for Temporal Forgery Localization — identifying manipulated (deepfake/forged) segments in video and audio. The core contribution is novel forensic detection methodology (DS-SSM, Relay Tokens, DC-SSM), not merely applying an existing detector to a new domain. This directly addresses AI-generated/manipulated content detection, a canonical ML09 concern.


Details

Domains
visionaudiomultimodal
Model Types
transformer
Threat Tags
inference_time
Datasets
BA-TFDUMMAFormer benchmark
Applications
video forgery detectionaudio forgery detectiontemporal forgery localization