Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

Backdoor attacks embed hidden malicious behaviors in reinforcement learning (RL) policies and activate them using triggers at test time. Most existing attacks are validated only in simulation, while their effectiveness in real-world robotic systems remains unclear. In physical deployment, safety-constrained control pipelines such as velocity limiting, action smoothing, and collision avoidance suppress abnormal actions, causing strong attenuation of conventional backdoor attacks. We study this previously overlooked problem and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. We design small printable visual patch triggers placed on the floor and generate them using a conditional diffusion model that produces diverse patch appearances under real-world visual variations. We treat the robot control stack as a black-box system. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. We evaluate our method on a TurtleBot3 mobile robot and demonstrate reliable activation of targeted attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.

Key Contributions

Identifies the 'attenuation phenomenon' showing that real-world safety-constrained control stacks (velocity limiting, action smoothing, collision avoidance) suppress conventional RL backdoor attacks
Proposes DGBA, a diffusion-guided backdoor framework using printable floor-patch triggers generated by a conditional diffusion model to handle real-world visual variation
Introduces an advantage-based poisoning strategy that selects decision-critical training states for efficient backdoor injection

🛡️ Threat Analysis

Model Poisoning

DGBA is a backdoor/trojan attack on RL policies: visual patch triggers embedded via training-time poisoning activate hidden malicious behavior at inference, while the policy behaves normally otherwise. The advantage-based poisoning strategy and diffusion-guided trigger generation are novel contributions to ML10 specifically.