Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
Boyuan Ruan 1, Xiaoyu Wang 1, Ya-Feng Liu 2
Published on arXiv
2604.10179
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
First tight analysis showing Byzantine error scales as O(κG²/(1-κB²)) where κ is Byzantine fraction, G is gradient bound, B is heterogeneity — proves this is unavoidable via matching lower bounds
R-DSGD and R-DSGD-M
Novel technique introduced
Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in our analysis are tight and characterize the fundamental limits of Byzantine resilience under stochasticity and data heterogeneity. Empirical results support our theoretical findings.
Key Contributions
- Unified convergence analysis of Byzantine-robust distributed SGD covering both variants with and without local momentum under general data heterogeneity
- Establishes tight convergence rates for nonconvex and PL objectives, proving local momentum reduces stochastic error
- Derives matching lower bounds demonstrating the upper bounds are tight and characterize fundamental limits of Byzantine resilience
🛡️ Threat Analysis
Paper analyzes defenses (robust aggregation rules) against Byzantine workers who corrupt training by sending arbitrary malicious updates — this is data poisoning at training time in distributed/federated learning. The paper establishes convergence guarantees and fundamental limits of Byzantine-resilient aggregation under adversarial conditions.