Word-Anchored Temporal Forgery Localization
Tianyi Wang 1, Xi Shao 2, Harry Cheng 1, Yinglong Wang 3, Mohan Kankanhalli 1
Published on arXiv
2603.06220
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
WAFL significantly outperforms state-of-the-art temporal forgery localization methods in both in-dataset and cross-dataset settings while requiring substantially fewer learnable parameters and higher computational efficiency.
WAFL (Word-Anchored Temporal Forgery Localization)
Novel technique introduced
Current temporal forgery localization (TFL) approaches typically rely on temporal boundary regression or continuous frame-level anomaly detection paradigms to derive candidate forgery proposals. However, they suffer not only from feature granularity misalignment but also from costly computation. To address these issues, we propose word-anchored temporal forgery localization (WAFL), a novel paradigm that shifts the TFL task from temporal regression and continuous localization to discrete word-level binary classification. Specifically, we first analyze the essence of temporal forgeries and identify the minimum meaningful forgery units, word tokens, and then align data preprocessing with the natural linguistic boundaries of speech. To adapt powerful pre-trained foundation backbones for feature extraction, we introduce the forensic feature realignment (FFR) module, mapping representations from the pre-trained semantic space to a discriminative forensic manifold. This allows subsequent lightweight linear classifiers to efficiently perform binary classification and accomplish the TFL task. Furthermore, to overcome the extreme class imbalance inherent to forgery detection, we design the artifact-centric asymmetric (ACA) loss, which breaks the standard precision-recall trade-off by dynamically suppressing overwhelming authentic gradients while asymmetrically prioritizing subtle forensic artifacts. Extensive experiments demonstrate that WAFL significantly outperforms state-of-the-art approaches in localization performance under both in- and cross-dataset settings, while requiring substantially fewer learnable parameters and operating at high computational efficiency.
Key Contributions
- Novel word-anchored paradigm reformulating temporal forgery localization as discrete word-token binary classification rather than continuous regression or frame-level anomaly detection
- Forensic Feature Realignment (FFR) module that maps pre-trained semantic representations into a discriminative forensic manifold for lightweight linear classification
- Artifact-Centric Asymmetric (ACA) loss that handles extreme class imbalance by penalizing fake samples strictly while suppressing easy authentic gradients
🛡️ Threat Analysis
Proposes a novel forensic detection system (WAFL) for localizing AI-generated/manipulated content within videos — directly addresses output integrity by detecting deepfake audio-visual segments. The primary contribution is a new detection methodology (FFR module, ACA loss) for identifying tampered content, which is the core of ML09 (AI-generated content detection / deepfake detection).