Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries
Lihan Xu 1, Yanjie Dong 1, Gang Wang 2, Runhao Zeng 1, Xiaoyi Fan 1, Xiping Hu 1
Published on arXiv
2511.02657
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Byrd-NAFL outperforms existing Byzantine-resilient FL baselines in convergence speed and accuracy under non-convex loss functions while maintaining robustness to diverse Byzantine attack strategies.
Byrd-NAFL
Novel technique introduced
We investigate robust federated learning, where a group of workers collaboratively train a shared model under the orchestration of a central server in the presence of Byzantine adversaries capable of arbitrary and potentially malicious behaviors. To simultaneously enhance communication efficiency and robustness against such adversaries, we propose a Byzantine-resilient Nesterov-Accelerated Federated Learning (Byrd-NAFL) algorithm. Byrd-NAFL seamlessly integrates Nesterov's momentum into the federated learning process alongside Byzantine-resilient aggregation rules to achieve fast and safeguarding convergence against gradient corruption. We establish a finite-time convergence guarantee for Byrd-NAFL under non-convex and smooth loss functions with relaxed assumption on the aggregated gradients. Extensive numerical experiments validate the effectiveness of Byrd-NAFL and demonstrate the superiority over existing benchmarks in terms of convergence speed, accuracy, and resilience to diverse Byzantine attack strategies.
Key Contributions
- Byrd-NAFL algorithm that seamlessly integrates Nesterov's momentum with Byzantine-resilient aggregation rules (Krum, CwMed, Bulyan, GeoMed) in federated learning
- Finite-time convergence guarantee for non-convex, smooth loss functions under relaxed assumptions on aggregated gradients — extending prior work beyond strongly convex settings
- Empirical demonstration of superior convergence speed, accuracy, and resilience versus existing benchmarks across diverse Byzantine attack strategies
🛡️ Threat Analysis
The paper proposes Byzantine-fault-tolerant FL aggregation rules (Krum, CwMed, GeoMed variants) integrated with Nesterov momentum to defend against malicious clients sending arbitrary gradient updates aimed at degrading global model performance — the canonical ML02 federated learning threat.