TIP: Resisting Gradient Inversion via Targeted Interpretable Perturbation in Federated Learning
Published on arXiv
2602.11633
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
TIP renders GIA-reconstructed images visually unrecognizable while maintaining model accuracy comparable to non-private baselines, significantly outperforming standard DP-based defenses in the privacy-utility trade-off
TIP (Targeted Interpretable Perturbation)
Novel technique introduced
Federated Learning (FL) facilitates collaborative model training while preserving data locality; however, the exchange of gradients renders the system vulnerable to Gradient Inversion Attacks (GIAs), allowing adversaries to reconstruct private training data with high fidelity. Existing defenses, such as Differential Privacy (DP), typically employ indiscriminate noise injection across all parameters, which severely degrades model utility and convergence stability. To address those limitation, we proposes Targeted Interpretable Perturbation (TIP), a novel defense framework that integrates model interpretability with frequency domain analysis. Unlike conventional methods that treat parameters uniformly, TIP introduces a dual-targeting strategy. First, leveraging Gradient-weighted Class Activation Mapping (Grad-CAM) to quantify channel sensitivity, we dynamically identify critical convolution channels that encode primary semantic features. Second, we transform these selected kernels into the frequency domain via the Discrete Fourier Transform and selectively inject calibrated perturbations into the high-frequency spectrum. By selectively perturbing high-frequency components, TIP effectively destroys the fine-grained details necessary for image reconstruction while preserving the low-frequency information crucial for model accuracy. Extensive experiments on benchmark datasets demonstrate that TIP renders reconstructed images visually unrecognizable against state-of-the-art GIAs, while maintaining global model accuracy comparable to non-private baselines, significantly outperforming existing DP-based defenses in the privacy-utility trade-off and interpretability. Code is available in https://github.com/2766733506/asldkfjssdf_arxiv
Key Contributions
- Dual-targeting strategy combining Grad-CAM channel sensitivity analysis with frequency-domain kernel selection to identify where perturbation matters most
- Selective injection of calibrated noise into high-frequency spectral components of critical convolution kernels, disrupting GIA reconstruction while preserving low-frequency features needed for model accuracy
- Demonstrates a significantly improved privacy-utility trade-off over DP baselines, rendering reconstructed images visually unrecognizable while matching non-private model accuracy
🛡️ Threat Analysis
Directly defends against Gradient Inversion Attacks (GIAs) in which an adversary reconstructs private training data from shared FL gradients — canonical gradient leakage/reconstruction threat. The paper proposes TIP as a targeted perturbation defense evaluated against state-of-the-art GIAs (e.g., DLG).