Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.

Key Contributions

First Byzantine-robust optimization algorithm designed for (L0,L1)-smooth functions combining normalized momentum with robust aggregation
Theoretical convergence rate of O(K^{-1/4}) under Byzantine attacks with explicit dependence on robustness coefficient and gradient heterogeneity
Empirical validation across MNIST, synthetic optimization, and GPT language modeling against multiple Byzantine attack strategies

🛡️ Threat Analysis

Data Poisoning Attack

Defends against Byzantine attacks in federated learning where malicious workers send arbitrary or corrupted gradient updates to degrade global model performance — this is data poisoning via gradient manipulation at training time.