Gradient Inversion in Federated Reinforcement Learning
Published on arXiv
2512.00303
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
RGIA successfully reconstructs private state-action-reward-next-state tuples in FRL by enforcing transition-dynamics regularization, overcoming the pseudo-solution problem that defeats standard gradient inversion methods.
RGIA (Regularization Gradient Inversion Attack)
Novel technique introduced
Federated reinforcement learning (FRL) enables distributed learning of optimal policies while preserving local data privacy through gradient sharing.However, FRL faces the risk of data privacy leaks, where attackers exploit shared gradients to reconstruct local training data.Compared to traditional supervised federated learning, successful reconstruction in FRL requires the generated data not only to match the shared gradients but also to align with real transition dynamics of the environment (i.e., aligning with the real data transition distribution).To address this issue, we propose a novel attack method called Regularization Gradient Inversion Attack (RGIA), which enforces prior-knowledge-based regularization on states, rewards, and transition dynamics during the optimization process to ensure that the reconstructed data remain close to the true transition distribution.Theoretically, we prove that the prior-knowledge-based regularization term narrows the solution space from a broad set containing spurious solutions to a constrained subset that satisfies both gradient matching and true transition dynamics.Extensive experiments on control tasks and autonomous driving tasks demonstrate that RGIA can effectively constrain reconstructed data transition distributions and thus successfully reconstruct local private data.
Key Contributions
- Identifies the pseudo-solution problem unique to FRL gradient inversion: reconstructed data must match gradients AND conform to true environment transition dynamics
- Proposes RGIA, which enforces prior-knowledge-based regularization over states, rewards, and transition dynamics to constrain the solution space to physically valid trajectories
- Theoretical proof that the regularization term narrows the solution space from a broad spurious set to a constrained subset satisfying both gradient matching and true transition dynamics
🛡️ Threat Analysis
The paper's primary contribution is an adversarial attack (RGIA) in which an attacker reconstructs private local training data (state, action, reward, next-state tuples) from shared gradients in federated reinforcement learning — a direct gradient leakage / data reconstruction threat model.