attack 2025

Dataset Poisoning Attacks on Behavioral Cloning Policies

Akansha Kalra , Soumil Datta , Ethan Gilmore , Duc La , Guanhong Tao , Daniel S. Brown

0 citations · 38 references · arXiv

α

Published on arXiv

2511.20992

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

BC policies trained on even minimally poisoned demonstration datasets retain near-baseline task performance but are highly vulnerable to backdoor trigger attacks at deployment, highlighting a stealthy and practically dangerous threat.

Entropy-based trigger attack

Novel technique introduced


Behavior Cloning (BC) is a popular framework for training sequential decision policies from expert demonstrations via supervised learning. As these policies are increasingly being deployed in the real world, their robustness and potential vulnerabilities are an important concern. In this work, we perform the first analysis of the efficacy of clean-label backdoor attacks on BC policies. Our backdoor attacks poison a dataset of demonstrations by injecting a visual trigger to create a spurious correlation that can be exploited at test time. We evaluate how policy vulnerability scales with the fraction of poisoned data, the strength of the trigger, and the trigger type. We also introduce a novel entropy-based test-time trigger attack that substantially degrades policy performance by identifying critical states where test-time triggering of the backdoor is expected to be most effective at degrading performance. We empirically demonstrate that BC policies trained on even minimally poisoned datasets exhibit deceptively high, near-baseline task performance despite being highly vulnerable to backdoor trigger attacks during deployment. Our results underscore the urgent need for more research into the robustness of BC policies, particularly as large-scale datasets are increasingly used to train policies for real-world cyber-physical systems. Videos and code are available at https://sites.google.com/view/dataset-poisoning-in-bc.


Key Contributions

  • First systematic analysis of clean-label backdoor attacks on behavioral cloning policies, evaluating scaling with poisoned fraction, trigger strength, and trigger type.
  • Novel entropy-based test-time trigger attack that identifies critical states where triggering the backdoor most effectively degrades policy performance.
  • Empirical demonstration that minimally poisoned BC policies maintain deceptively high near-baseline task performance while remaining highly vulnerable to backdoor activation at deployment.

🛡️ Threat Analysis

Data Poisoning Attack

The attack mechanism is clean-label dataset poisoning of demonstration data; the paper explicitly evaluates how vulnerability scales with poisoned data fraction — data poisoning is both the vehicle and a distinct contribution.

Model Poisoning

Core contribution is injecting clean-label backdoor triggers into BC demonstration datasets, creating hidden targeted behavior that activates only when the visual trigger is present at test time — canonical backdoor/trojan attack.


Details

Domains
visionreinforcement-learning
Model Types
cnn
Threat Tags
training_timetargetedblack_boxdigital
Applications
behavioral cloningimitation learningrobotic manipulationcyber-physical systems