defense 2025

KeTS: Kernel-based Trust Segmentation against Model Poisoning Attacks

Ankit Gangwal 1, Mauro Conti 2, Tommaso Pauselli 3

0 citations

α

Published on arXiv

2501.06729

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

KeTS outperforms the best existing defense by >24% on MNIST, >14% on Fashion-MNIST, >9% on CIFAR-10, and >11% on KDD-CUP-1999 across all six model poisoning attack settings.

KeTS (Kernel-based Trust Segmentation)

Novel technique introduced


Federated Learning (FL) enables multiple users to collaboratively train a global model in a distributed manner without revealing their personal data. However, FL remains vulnerable to model poisoning attacks, where malicious actors inject crafted updates to compromise the global model's accuracy. We propose a novel defense mechanism, Kernel-based Trust Segmentation (KeTS), to counter model poisoning attacks. Unlike existing approaches, KeTS analyzes the evolution of each client's updates and effectively segments malicious clients using Kernel Density Estimation (KDE), even in the presence of benign outliers. We thoroughly evaluate KeTS's performance against the six most effective model poisoning attacks (i.e., Trim-Attack, Krum-Attack, Min-Max attack, Min-Sum attack, and their variants) on four different datasets (i.e., MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999) and compare its performance with three classical robust schemes (i.e., Krum, Trim-Mean, and Median) and a state-of-the-art defense (i.e., FLTrust). Our results show that KeTS outperforms the existing defenses in every attack setting; beating the best-performing defense by an overall average of >24% (on MNIST), >14% (on Fashion-MNIST), >9% (on CIFAR-10), >11% (on KDD-CUP-1999). A series of further experiments (varying poisoning approaches, attacker population, etc.) reveal the consistent and superior performance of KeTS under diverse conditions. KeTS is a practical solution as it satisfies all three defense objectives (i.e., fidelity, robustness, and efficiency) without imposing additional overhead on the clients. Finally, we also discuss a simple, yet effective extension to KeTS to handle consistent-untargeted (e.g., sign-flipping) attacks as well as targeted attacks (e.g., label-flipping).


Key Contributions

  • KeTS computes per-client trust scores by analyzing the temporal evolution of each client's model updates, then segments benign from malicious clients using Kernel Density Estimation — robust to benign outliers in non-IID settings.
  • Empirically evaluated against six untargeted model poisoning attacks in white-box scenarios on MNIST, Fashion-MNIST, CIFAR-10, and KDD-CUP-1999, outperforming Krum, Trim-Mean, Median, and FLTrust across all settings.
  • Satisfies all three defense objectives (fidelity, robustness, efficiency) with no additional overhead on clients; extended to handle sign-flipping and label-flipping attacks.

🛡️ Threat Analysis

Data Poisoning Attack

Primary contribution is a defense against untargeted Byzantine model poisoning attacks in federated learning (Trim-Attack, Krum-Attack, Min-Max, Min-Sum), where malicious clients inject crafted updates to degrade global model accuracy — the canonical ML02 federated learning threat.


Details

Domains
federated-learningvisiontabular
Model Types
federatedcnntraditional_ml
Threat Tags
white_boxtraining_timeuntargeted
Datasets
MNISTFashion-MNISTCIFAR-10KDD-CUP-1999
Applications
federated learningimage classificationnetwork intrusion detection