α

Published on arXiv

2604.03226

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Achieves significant model accuracy improvement even when fraction of malicious clients exceeds 50% using small synthetic server dataset

Server Learning with Geometric Median Aggregation

Novel technique introduced


This paper explores the use of server learning for enhancing the robustness of federated learning against malicious attacks even when clients' training data are not independent and identically distributed. We propose a heuristic algorithm that uses server learning and client update filtering in combination with geometric median aggregation. We demonstrate via experiments that this approach can achieve significant improvement in model accuracy even when the fraction of malicious clients is high, even more than $50\%$ in some cases, and the dataset utilized by the server is small and could be synthetic with its distribution not necessarily close to that of the clients' aggregated data.


Key Contributions

  • Heuristic algorithm combining server learning with synthetic data, client update filtering, and geometric median aggregation for Byzantine robustness
  • Demonstrates resilience to >50% malicious clients even with non-IID data and small synthetic server datasets
  • Shows that server dataset distribution does not need to match clients' aggregated data distribution

🛡️ Threat Analysis

Data Poisoning Attack

Defends against Byzantine attacks in federated learning where malicious clients send arbitrary model updates to degrade the global model — this is a data poisoning attack via corrupted gradient submissions.


Details

Domains
federated-learning
Model Types
federated
Threat Tags
training_time
Applications
federated learning