attack 2026

Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

Tairan Huang , Qingqing Ye , Yulin Jin , Jiawei Lian , Yi Wang , Haibo Hu

0 citations · 30 references · arXiv

α

Published on arXiv

2601.14104

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

DGBA achieves reliable targeted backdoor activation on a TurtleBot3 mobile robot under safety-constrained control, while conventional RL backdoor attacks (TrojDRL, BadRL, SleeperNets) fail in the same real-world setting.

DGBA

Novel technique introduced


Backdoor attacks embed hidden malicious behaviors in reinforcement learning (RL) policies and activate them using triggers at test time. Most existing attacks are validated only in simulation, while their effectiveness in real-world robotic systems remains unclear. In physical deployment, safety-constrained control pipelines such as velocity limiting, action smoothing, and collision avoidance suppress abnormal actions, causing strong attenuation of conventional backdoor attacks. We study this previously overlooked problem and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. We design small printable visual patch triggers placed on the floor and generate them using a conditional diffusion model that produces diverse patch appearances under real-world visual variations. We treat the robot control stack as a black-box system. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. We evaluate our method on a TurtleBot3 mobile robot and demonstrate reliable activation of targeted attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.


Key Contributions

  • Identifies the 'attenuation phenomenon' showing that real-world safety-constrained control stacks (velocity limiting, action smoothing, collision avoidance) suppress conventional RL backdoor attacks
  • Proposes DGBA, a diffusion-guided backdoor framework using printable floor-patch triggers generated by a conditional diffusion model to handle real-world visual variation
  • Introduces an advantage-based poisoning strategy that selects decision-critical training states for efficient backdoor injection

🛡️ Threat Analysis

Model Poisoning

DGBA is a backdoor/trojan attack on RL policies: visual patch triggers embedded via training-time poisoning activate hidden malicious behavior at inference, while the policy behaves normally otherwise. The advantage-based poisoning strategy and diffusion-guided trigger generation are novel contributions to ML10 specifically.


Details

Domains
reinforcement-learningvision
Model Types
rldiffusion
Threat Tags
black_boxtraining_timetargetedphysical
Datasets
TurtleBot3 real-world evaluation
Applications
robotic navigationmobile robotics