defense 2025

Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

James Pedley , Benjamin Etheridge , Stephen J. Roberts , Francesco Quinzan

0 citations · 77 references · arXiv

α

Published on arXiv

2510.12939

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Pruning provably tightens certified robustness bounds and empirically discovers sweet spots at moderate sparsity where robustness improves substantially without harming clean performance

SA-MDP certified pruning framework

Novel technique introduced


Reinforcement learning (RL) policies deployed in real-world environments must remain reliable under adversarial perturbations. At the same time, modern deep RL agents are heavily over-parameterized, raising costs and fragility concerns. While pruning has been shown to improve robustness in supervised learning, its role in adversarial RL remains poorly understood. We develop the first theoretical framework for certified robustness under pruning in state-adversarial Markov decision processes (SA-MDPs). For Gaussian and categorical policies with Lipschitz networks, we prove that element-wise pruning can only tighten certified robustness bounds; pruning never makes the policy less robust. Building on this, we derive a novel three-term regret decomposition that disentangles clean-task performance, pruning-induced performance loss, and robustness gains, exposing a fundamental performance--robustness frontier. Empirically, we evaluate magnitude and micro-pruning schedules on continuous-control benchmarks with strong policy-aware adversaries. Across tasks, pruning consistently uncovers reproducible ``sweet spots'' at moderate sparsity levels, where robustness improves substantially without harming - and sometimes even enhancing - clean performance. These results position pruning not merely as a compression tool but as a structural intervention for robust RL.


Key Contributions

  • First theoretical framework proving element-wise pruning can only tighten certified robustness bounds in SA-MDPs for Lipschitz networks with Gaussian and categorical policies
  • Novel three-term regret decomposition disentangling clean-task performance, pruning-induced loss, and robustness gains, exposing a fundamental performance-robustness frontier
  • Empirical validation on continuous-control benchmarks with policy-aware adversaries showing reproducible sweet spots at moderate sparsity levels

🛡️ Threat Analysis

Input Manipulation Attack

The paper defends against adversarial state perturbations in SA-MDPs (state-adversarial Markov decision processes), where an adversary perturbs RL policy inputs at inference time. The core contribution is certifying that pruning cannot decrease robustness against such input manipulation attacks, with formal bounds proven for Gaussian and categorical policies.


Details

Domains
reinforcement-learning
Model Types
rl
Threat Tags
white_boxinference_time
Datasets
continuous-control benchmarks (MuJoCo-style)
Applications
continuous-control reinforcement learningrobust rl deployment