KeTS: Kernel-based Trust Segmentation against Model Poisoning Attacks
Ankit Gangwal 1, Mauro Conti 2, Tommaso Pauselli 3
Published on arXiv
2501.06729
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
KeTS outperforms the best existing defense by >24% on MNIST, >14% on Fashion-MNIST, >9% on CIFAR-10, and >11% on KDD-CUP-1999 across all six model poisoning attack settings.
KeTS (Kernel-based Trust Segmentation)
Novel technique introduced
Federated Learning (FL) enables multiple users to collaboratively train a global model in a distributed manner without revealing their personal data. However, FL remains vulnerable to model poisoning attacks, where malicious actors inject crafted updates to compromise the global model's accuracy. We propose a novel defense mechanism, Kernel-based Trust Segmentation (KeTS), to counter model poisoning attacks. Unlike existing approaches, KeTS analyzes the evolution of each client's updates and effectively segments malicious clients using Kernel Density Estimation (KDE), even in the presence of benign outliers. We thoroughly evaluate KeTS's performance against the six most effective model poisoning attacks (i.e., Trim-Attack, Krum-Attack, Min-Max attack, Min-Sum attack, and their variants) on four different datasets (i.e., MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999) and compare its performance with three classical robust schemes (i.e., Krum, Trim-Mean, and Median) and a state-of-the-art defense (i.e., FLTrust). Our results show that KeTS outperforms the existing defenses in every attack setting; beating the best-performing defense by an overall average of >24% (on MNIST), >14% (on Fashion-MNIST), >9% (on CIFAR-10), >11% (on KDD-CUP-1999). A series of further experiments (varying poisoning approaches, attacker population, etc.) reveal the consistent and superior performance of KeTS under diverse conditions. KeTS is a practical solution as it satisfies all three defense objectives (i.e., fidelity, robustness, and efficiency) without imposing additional overhead on the clients. Finally, we also discuss a simple, yet effective extension to KeTS to handle consistent-untargeted (e.g., sign-flipping) attacks as well as targeted attacks (e.g., label-flipping).
Key Contributions
- KeTS computes per-client trust scores by analyzing the temporal evolution of each client's model updates, then segments benign from malicious clients using Kernel Density Estimation — robust to benign outliers in non-IID settings.
- Empirically evaluated against six untargeted model poisoning attacks in white-box scenarios on MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999, outperforming Krum, Trim-Mean, Median, and FLTrust across all settings.
- Satisfies all three defense objectives (fidelity, robustness, efficiency) with no additional overhead on clients; extended to handle sign-flipping and label-flipping attacks.
🛡️ Threat Analysis
Primary contribution is a defense against untargeted Byzantine model poisoning attacks in federated learning (Trim-Attack, Krum-Attack, Min-Max, Min-Sum), where malicious clients inject crafted updates to degrade global model accuracy — the canonical ML02 federated learning threat.