defense 2025

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets

Shriram Karpoora Sundara Pandian , Ali Baheri

0 citations · 30 references · arXiv

α

Published on arXiv

2510.01479

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Weighted BC maintains near-optimal performance even at high contamination ratios, outperforming BC, BCQ, and BRAC across all evaluated poisoning protocols

Weighted BC (Density-Ratio Weighted Behavioral Cloning)

Novel technique introduced


Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).


Key Contributions

  • Weighted BC: a discriminator-based method that estimates trajectory-level density ratios from a small verified clean reference set to down-weight or discard corrupted trajectories in the behavioral cloning objective, requiring no knowledge of the contamination mechanism
  • Theoretical guarantees — uniform clean-risk approximation bounds independent of contamination rate, excess clean-risk guarantees, and decomposed finite-sample/discriminator/bias error analysis
  • Comprehensive poisoning evaluation framework covering reward, state, transition, and action corruption protocols at varying severities on continuous control benchmarks

🛡️ Threat Analysis

Data Poisoning Attack

The paper's primary contribution is a defense against data poisoning of offline RL training datasets — corrupted trajectories (reward, state, transition, and action poisoning) are identified and down-weighted via discriminator-estimated density ratios using a small clean reference set, directly addressing training-time data corruption attacks.


Details

Domains
reinforcement-learning
Model Types
rl
Threat Tags
training_time
Datasets
D4RL continuous control benchmarks
Applications
offline reinforcement learningcontinuous controlautonomous vehiclesindustrial robotics