defense 2026

From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity

Renaud Gaucher 1,2, Aymeric Dieuleveut 1, Hadrien Hendrikx 2

0 citations · 39 references · arXiv

α

Published on arXiv

2602.03329

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Byzantine-robust optimization provably reduces to inexact gradient descent, enabling an accelerated algorithm with communication complexity improvements demonstrated both theoretically and empirically over prior methods.

Optimization under Similarity for Byzantine Robustness

Novel technique introduced


Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research. This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the Byzantine setting. Going further, we propose two optimization schemes to speed up the convergence. The first one is a Nesterov-type accelerated scheme whose proof directly derives from accelerated inexact gradient results applied to our formulation. The second one hinges on Optimization under Similarity, in which the server leverages an auxiliary loss function that approximates the global loss. Both approaches allow to drastically reduce the communication complexity compared to previous methods, as we show theoretically and empirically.


Key Contributions

  • Formal reduction showing Byzantine-robust distributed optimization is a special case of optimization with inexact gradient oracles (additive and multiplicative error), enabling direct reuse of inexact-gradient results
  • Nesterov-type accelerated Byzantine-robust algorithm achieving optimal accelerated convergence rate O(sqrt(μ/L)) under standard heterogeneity assumptions
  • Optimization under Similarity scheme where the server leverages an auxiliary loss to drastically reduce communication complexity compared to prior Byzantine-robust methods

🛡️ Threat Analysis

Data Poisoning Attack

Byzantine nodes in federated learning are malicious clients sending adversarially crafted gradient updates to corrupt the global model — exactly the threat ML02 covers for federated learning. The paper proposes and analyzes defense mechanisms (robust aggregation + accelerated optimization schemes) against this threat.


Details

Domains
federated-learning
Model Types
federated
Threat Tags
training_timewhite_box
Applications
federated learningdistributed optimization