Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models
Zikang Ding 1, Haomiao Yang 1, Meng Hao 2, Wenbo Jiang 1, Kunlan Xiang 1, Runmeng Du 3, Yijing Liu 1, Ruichen Zhang 4, Dusit Niyato 4
1 University of Electronic Science and Technology of China
2 Singapore Management University
Published on arXiv
2603.11949
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
DND achieves ~99% post-activation attack success rate (vs. <95% average for prior methods) while remaining dormant for a controllable period, maintaining ≥94% clean accuracy, and evading state-of-the-art defenses.
DND (Delayed Backdoor Attacks Based on Nonlinear Decay)
Novel technique introduced
Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.
Key Contributions
- First formalization of the temporal dimension as a distinct attack surface for backdoor attacks in pre-trained models, introducing the Delayed Backdoor Attack (DBA) paradigm
- Proof-of-concept implementation DND (Delayed Backdoor Attacks Based on Nonlinear Decay) that embeds a lightweight stateful logic module enabling configurable dormancy periods before malicious activation
- Dual-metric evaluation framework (ASR and ASR_delay) for empirically characterizing delay behavior, validated on four NLP benchmarks showing ~99% post-activation attack success and resilience against state-of-the-art defenses
🛡️ Threat Analysis
The paper introduces a new class of backdoor/trojan attack (DND) that embeds hidden, stateful malicious logic in pre-trained NLP models. Activation is decoupled from trigger occurrence but is still trigger-conditioned — this is fundamentally a backdoor/trojan attack. Per the instructions, even though PTM supply chain is the threat context, the primary contribution is the backdoor injection technique itself (temporal dimension), so ML10 is the sole appropriate tag.