attack 2026

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

0 citations

Published on arXiv

2603.11949

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

DND achieves ~99% post-activation attack success rate (vs. <95% average for prior methods) while remaining dormant for a controllable period, maintaining ≥94% clean accuracy, and evading state-of-the-art defenses.

DND (Delayed Backdoor Attacks Based on Nonlinear Decay)

Novel technique introduced

Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.

Key Contributions

First formalization of the temporal dimension as a distinct attack surface for backdoor attacks in pre-trained models, introducing the Delayed Backdoor Attack (DBA) paradigm
Proof-of-concept implementation DND (Delayed Backdoor Attacks Based on Nonlinear Decay) that embeds a lightweight stateful logic module enabling configurable dormancy periods before malicious activation
Dual-metric evaluation framework (ASR and ASR_delay) for empirically characterizing delay behavior, validated on four NLP benchmarks showing ~99% post-activation attack success and resilience against state-of-the-art defenses

🛡️ Threat Analysis

Model Poisoning

The paper introduces a new class of backdoor/trojan attack (DND) that embeds hidden, stateful malicious logic in pre-trained NLP models. Activation is decoupled from trigger occurrence but is still trigger-conditioned — this is fundamentally a backdoor/trojan attack. Per the instructions, even though PTM supply chain is the threat context, the primary contribution is the backdoor injection technique itself (temporal dimension), so ML10 is the sole appropriate tag.

Details

Domains

nlp

Model Types

transformerllm

Threat Tags

training_timetargeteddigital

Datasets

four NLP benchmarks (unspecified in excerpt)

Applications

nlp text classificationpre-trained language models

Read PDF arXiv

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Adversarial Contrastive Learning for LLM Quantization Attacks

Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

SASER: Stego attacks on open-source LLMs

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers