TIP: Resisting Gradient Inversion via Targeted Interpretable Perturbation in Federated Learning

Federated Learning (FL) facilitates collaborative model training while preserving data locality; however, the exchange of gradients renders the system vulnerable to Gradient Inversion Attacks (GIAs), allowing adversaries to reconstruct private training data with high fidelity. Existing defenses, such as Differential Privacy (DP), typically employ indiscriminate noise injection across all parameters, which severely degrades model utility and convergence stability. To address those limitation, we proposes Targeted Interpretable Perturbation (TIP), a novel defense framework that integrates model interpretability with frequency domain analysis. Unlike conventional methods that treat parameters uniformly, TIP introduces a dual-targeting strategy. First, leveraging Gradient-weighted Class Activation Mapping (Grad-CAM) to quantify channel sensitivity, we dynamically identify critical convolution channels that encode primary semantic features. Second, we transform these selected kernels into the frequency domain via the Discrete Fourier Transform and selectively inject calibrated perturbations into the high-frequency spectrum. By selectively perturbing high-frequency components, TIP effectively destroys the fine-grained details necessary for image reconstruction while preserving the low-frequency information crucial for model accuracy. Extensive experiments on benchmark datasets demonstrate that TIP renders reconstructed images visually unrecognizable against state-of-the-art GIAs, while maintaining global model accuracy comparable to non-private baselines, significantly outperforming existing DP-based defenses in the privacy-utility trade-off and interpretability. Code is available in https://github.com/2766733506/asldkfjssdf_arxiv

Key Contributions

Dual-targeting strategy combining Grad-CAM channel sensitivity analysis with frequency-domain kernel selection to identify where perturbation matters most
Selective injection of calibrated noise into high-frequency spectral components of critical convolution kernels, disrupting GIA reconstruction while preserving low-frequency features needed for model accuracy
Demonstrates a significantly improved privacy-utility trade-off over DP baselines, rendering reconstructed images visually unrecognizable while matching non-private model accuracy

🛡️ Threat Analysis

Model Inversion Attack

Directly defends against Gradient Inversion Attacks (GIAs) in which an adversary reconstructs private training data from shared FL gradients — canonical gradient leakage/reconstruction threat. The paper proposes TIP as a targeted perturbation defense evaluated against state-of-the-art GIAs (e.g., DLG).

Details

Domains

visionfederated-learning

Model Types

cnnfederated

Threat Tags

white_boxtraining_timedigital

Datasets

CIFAR-10ImageNet

Applications

2025 0 cit.

Model Inversion Attack

80%