Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
Xuan Chen 1, Shiwei Feng 1, Zikang Xiong 1, Shengwei An 1, Yunshu Mao 1, Lu Yan 1, Guanhong Tao 2, Wenbo Guo 3, Xiangyu Zhang 1
Published on arXiv
2509.16950
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Trajectory-based backdoor triggers defined via temporal logic achieve effective and stealthy attacks across 5 offline RL driving agents, exposing a class of vulnerabilities unexplored by prior pixel-level trigger approaches.
TL-based Trajectory Backdoor Attack
Novel technique introduced
Assessing the safety of autonomous driving (AD) systems against security threats, particularly backdoor attacks, is a stepping stone for real-world deployment. However, existing works mainly focus on pixel-level triggers that are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehicles' trajectories as triggers. To generate precise trigger trajectories, we first use temporal logic (TL) specifications to define the behaviors of attacker vehicles. Configurable behavior models are then used to generate these trajectories, which are quantitatively evaluated and iteratively refined based on the TL specifications. We further develop a negative training strategy by incorporating patch trajectories that are similar to triggers but are designated not to activate the backdoor. It enhances the stealthiness of the attack and refines the system's responses to trigger scenarios. Through extensive experiments on 5 offline reinforcement learning (RL) driving agents with 6 trigger patterns and target action combinations, we demonstrate the flexibility and effectiveness of our proposed attack, showing the under-exploration of existing end-to-end AD systems' vulnerabilities to such trajectory-based backdoor attacks.
Key Contributions
- Trajectory-based backdoor trigger framework using temporal logic (TL) specifications to define, quantitatively evaluate, and iteratively refine attacker vehicle behaviors as physically deployable triggers
- Negative training strategy incorporating near-trigger 'patch trajectories' to suppress false activations and enhance attack stealthiness
- Empirical evaluation across 5 offline RL driving agents with 6 trigger pattern and target action combinations, demonstrating flexible and effective backdoor exploitation
🛡️ Threat Analysis
Core contribution is a novel backdoor injection technique: trajectory-based triggers (defined via temporal logic specifications) are embedded into offline RL agent training, causing targeted malicious actions only when specific multi-vehicle trajectory patterns appear. Negative training strategy is also proposed to improve trigger stealthiness — all hallmarks of ML10.