On the Detectability of Active Gradient Inversion Attacks in Federated Learning

One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a false sense of security. Despite not sharing private data increases the overall privacy, prior studies have shown that gradients exchanged during the FL training remain vulnerable to Gradient Inversion Attacks (GIAs). These attacks allow reconstructing the clients' local data, breaking the privacy promise of FL. GIAs can be launched by either a passive or an active server. In the latter case, a malicious server manipulates the global model to facilitate data reconstruction. While effective, earlier attacks falling under this category have been demonstrated to be detectable by clients, limiting their real-world applicability. Recently, novel active GIAs have emerged, claiming to be far stealthier than previous approaches. This work provides the first comprehensive analysis of these claims, investigating four state-of-the-art GIAs. We propose novel lightweight client-side detection techniques, based on statistically improbable weight structures and anomalous loss and gradient dynamics. Extensive evaluation across several configurations demonstrates that our methods enable clients to effectively detect active GIAs without any modifications to the FL training protocol.

Key Contributions

First comprehensive analysis of stealthiness claims of four state-of-the-art active Gradient Inversion Attacks in federated learning
Novel lightweight client-side detection methods based on statistically improbable weight structures and anomalous loss/gradient dynamics
Demonstrated detection effectiveness across multiple FL configurations without requiring modifications to the standard FL protocol

🛡️ Threat Analysis

Model Inversion Attack

Gradient Inversion Attacks (GIAs) are the canonical ML03 threat: an adversary (here, a malicious FL server) reconstructs clients' private training data from shared gradients. The paper analyzes four state-of-the-art active GIAs and proposes defenses (detection techniques) directly against this data reconstruction threat in federated learning.

Details

Domains

federated-learning

Model Types

federated

Threat Tags

white_boxtraining_time

Applications

2025 0 cit.

Model Inversion Attack

100%