defense arXiv Jan 12, 2026 · 12w ago
Lucas Schott, Elies Gherbi, Hatem Hajri et al. · IRT SystemX · Sorbonne Université +2 more
Adaptive adversarial training for RL using reward-preserving attacks that calibrate perturbation strength to avoid making tasks unsolvable
Input Manipulation Attack reinforcement-learning
Adversarial training in reinforcement learning (RL) is challenging because perturbations cascade through trajectories and compound over time, making fixed-strength attacks either overly destructive or too conservative. We propose reward-preserving attacks, which adapt adversarial strength so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, perturbation magnitudes $η$ are selected dynamically, using a learned critic $Q((s,a),η)$ that estimates the expected return of $α$-reward-preserving rollouts. For intermediate values of $α$, this adaptive training yields policies that are robust across a wide range of perturbation magnitudes while preserving nominal performance, outperforming fixed-radius and uniformly sampled-radius adversarial training.
rl IRT SystemX · Sorbonne Université · Safran Electronics and Defense +1 more