From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning
Xiangrui Xu 1,2, Zhize Li 3, Yufei Han 1, Bin Wang 4, Jiqiang Liu 1,5, Wei Wang 2
Published on arXiv
2512.15460
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
DRA risk in FL is governed by spectral properties of the Jacobian of shared model updates, providing a unified theoretical explanation for why existing defenses work and enabling adaptive noise-based defenses that protect privacy without accuracy loss.
InvLoss / InvRE
Novel technique introduced
Data Reconstruction Attacks (DRA) pose a significant threat to Federated Learning (FL) systems by enabling adversaries to infer sensitive training data from local clients. Despite extensive research, the question of how to characterize and assess the risk of DRAs in FL systems remains unresolved due to the lack of a theoretically-grounded risk quantification framework. In this work, we address this gap by introducing Invertibility Loss (InvLoss) to quantify the maximum achievable effectiveness of DRAs for a given data instance and FL model. We derive a tight and computable upper bound for InvLoss and explore its implications from three perspectives. First, we show that DRA risk is governed by the spectral properties of the Jacobian matrix of exchanged model updates or feature embeddings, providing a unified explanation for the effectiveness of defense methods. Second, we develop InvRE, an InvLoss-based DRA risk estimator that offers attack method-agnostic, comprehensive risk evaluation across data instances and model architectures. Third, we propose two adaptive noise perturbation defenses that enhance FL privacy without harming classification accuracy. Extensive experiments on real-world datasets validate our framework, demonstrating its potential for systematic DRA risk evaluation and mitigation in FL systems.
Key Contributions
- InvLoss: a theoretically-grounded upper bound on the maximum achievable effectiveness of data reconstruction attacks for a given data instance and FL model, derived from spectral properties of the Jacobian of exchanged gradients/embeddings
- InvRE: an attack-method-agnostic DRA risk estimator enabling comprehensive evaluation across data instances and model architectures without requiring a specific attack implementation
- Two adaptive noise perturbation defenses that improve FL privacy guarantees without degrading classification accuracy
🛡️ Threat Analysis
The paper's entire focus is on Data Reconstruction Attacks in FL, where an adversary reconstructs private client training data from exchanged model updates or feature embeddings. The paper introduces InvLoss to bound DRA effectiveness, develops InvRE as an attack-agnostic risk estimator, and proposes noise perturbation defenses evaluated against reconstruction attacks — a textbook ML03 scenario.