Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering
Emmanouil Kritharakis 1, Dusan Jakovetic 2, Antonios Makris 1, Konstantinos Tserpes 1
Published on arXiv
2508.12672
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Proposed loss-based clustering aggregation significantly outperforms all robust FL baselines (Krum, Multi-Krum, Trimmed Mean, Median) across three datasets and three Byzantine attack types with bounded convergence guarantees.
Loss-Based Client Clustering
Novel technique introduced
Federated Learning (FL) enables collaborative model training across multiple clients without sharing private data. We consider FL scenarios wherein FL clients are subject to adversarial (Byzantine) attacks, while the FL server is trusted (honest) and has a trustworthy side dataset. This may correspond to, e.g., cases where the server possesses trusted data prior to federation, or to the presence of a trusted client that temporarily assumes the server role. Our approach requires only two honest participants, i.e., the server and one client, to function effectively, without prior knowledge of the number of malicious clients. Theoretical analysis demonstrates bounded optimality gaps even under strong Byzantine attacks. Experimental results show that our algorithm significantly outperforms standard and robust FL baselines such as Mean, Trimmed Mean, Median, Krum, and Multi-Krum under various attack strategies including label flipping, sign flipping, and Gaussian noise addition across MNIST, FMNIST, and CIFAR-10 benchmarks using the Flower framework.
Key Contributions
- Loss-based client clustering aggregation method that uses a trusted server-side dataset to distinguish honest from malicious FL clients without prior knowledge of the number of adversaries
- Theoretical analysis proving bounded optimality gaps under strong Byzantine attacks requiring only two honest participants (server + one client)
- Empirical evaluation on MNIST, FMNIST, and CIFAR-10 showing consistent outperformance over Mean, Trimmed Mean, Median, Krum, and Multi-Krum baselines under label flipping, sign flipping, and Gaussian noise attacks
🛡️ Threat Analysis
The paper defends against Byzantine client attacks (label flipping, sign flipping, Gaussian noise injection) that corrupt training updates to degrade global model performance — these are data/gradient poisoning attacks at training time, and the paper's contribution is a robust aggregation defense (loss-based client clustering) evaluated against them.