defense 2026

Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

Arman Bolatov , Samuel Horváth , Martin Takáč , Eduard Gorbunov

0 citations

α

Published on arXiv

2603.12512

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Achieves O(K^{-1/4}) convergence rate under Byzantine attacks while maintaining robustness across various attack strategies (Bit Flipping, Label Flipping, Mimic, ALIE)

Byz-NSGDM

Novel technique introduced


We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.


Key Contributions

  • First Byzantine-robust optimization algorithm designed for (L0,L1)-smooth functions combining normalized momentum with robust aggregation
  • Theoretical convergence rate of O(K^{-1/4}) under Byzantine attacks with explicit dependence on robustness coefficient and gradient heterogeneity
  • Empirical validation across MNIST, synthetic optimization, and GPT language modeling against multiple Byzantine attack strategies

🛡️ Threat Analysis

Data Poisoning Attack

Defends against Byzantine attacks in federated learning where malicious workers send arbitrary or corrupted gradient updates to degrade global model performance — this is data poisoning via gradient manipulation at training time.


Details

Domains
federated-learningvisionnlp
Model Types
federatedcnntransformer
Threat Tags
training_timegrey_box
Datasets
MNISTcharacter-level text corpus
Applications
federated learningdistributed optimizationimage classificationlanguage modeling