KeTS: Kernel-based Trust Segmentation against Model Poisoning Attacks

Federated Learning (FL) enables multiple users to collaboratively train a global model in a distributed manner without revealing their personal data. However, FL remains vulnerable to model poisoning attacks, where malicious actors inject crafted updates to compromise the global model's accuracy. We propose a novel defense mechanism, Kernel-based Trust Segmentation (KeTS), to counter model poisoning attacks. Unlike existing approaches, KeTS analyzes the evolution of each client's updates and effectively segments malicious clients using Kernel Density Estimation (KDE), even in the presence of benign outliers. We thoroughly evaluate KeTS's performance against the six most effective model poisoning attacks (i.e., Trim-Attack, Krum-Attack, Min-Max attack, Min-Sum attack, and their variants) on four different datasets (i.e., MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999) and compare its performance with three classical robust schemes (i.e., Krum, Trim-Mean, and Median) and a state-of-the-art defense (i.e., FLTrust). Our results show that KeTS outperforms the existing defenses in every attack setting; beating the best-performing defense by an overall average of >24% (on MNIST), >14% (on Fashion-MNIST), >9% (on CIFAR-10), >11% (on KDD-CUP-1999). A series of further experiments (varying poisoning approaches, attacker population, etc.) reveal the consistent and superior performance of KeTS under diverse conditions. KeTS is a practical solution as it satisfies all three defense objectives (i.e., fidelity, robustness, and efficiency) without imposing additional overhead on the clients. Finally, we also discuss a simple, yet effective extension to KeTS to handle consistent-untargeted (e.g., sign-flipping) attacks as well as targeted attacks (e.g., label-flipping).

Key Contributions

KeTS computes per-client trust scores by analyzing the temporal evolution of each client's model updates, then segments benign from malicious clients using Kernel Density Estimation — robust to benign outliers in non-IID settings.
Empirically evaluated against six untargeted model poisoning attacks in white-box scenarios on MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999, outperforming Krum, Trim-Mean, Median, and FLTrust across all settings.
Satisfies all three defense objectives (fidelity, robustness, efficiency) with no additional overhead on clients; extended to handle sign-flipping and label-flipping attacks.

🛡️ Threat Analysis

Data Poisoning Attack

Primary contribution is a defense against untargeted Byzantine model poisoning attacks in federated learning (Trim-Attack, Krum-Attack, Min-Max, Min-Sum), where malicious clients inject crafted updates to degrade global model accuracy — the canonical ML02 federated learning threat.

Details

Domains

federated-learningvisiontabular

Model Types

federatedcnntraditional_ml

Threat Tags

white_boxtraining_timeuntargeted

Datasets

MNISTFashion-MNISTCIFAR-10KDD-CUP-1999

Applications

2026 0 cit.

Data Poisoning Attack

65%