From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity

Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research. This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the Byzantine setting. Going further, we propose two optimization schemes to speed up the convergence. The first one is a Nesterov-type accelerated scheme whose proof directly derives from accelerated inexact gradient results applied to our formulation. The second one hinges on Optimization under Similarity, in which the server leverages an auxiliary loss function that approximates the global loss. Both approaches allow to drastically reduce the communication complexity compared to previous methods, as we show theoretically and empirically.

Key Contributions

Formal reduction showing Byzantine-robust distributed optimization is a special case of optimization with inexact gradient oracles (additive and multiplicative error), enabling direct reuse of inexact-gradient results
Nesterov-type accelerated Byzantine-robust algorithm achieving optimal accelerated convergence rate O(sqrt(μ/L)) under standard heterogeneity assumptions
Optimization under Similarity scheme where the server leverages an auxiliary loss to drastically reduce communication complexity compared to prior Byzantine-robust methods

🛡️ Threat Analysis

Data Poisoning Attack

Byzantine nodes in federated learning are malicious clients sending adversarially crafted gradient updates to corrupt the global model — exactly the threat ML02 covers for federated learning. The paper proposes and analyzes defense mechanisms (robust aggregation + accelerated optimization schemes) against this threat.

Details

Domains

federated-learning

Model Types

federated

Threat Tags

training_timewhite_box

Applications

2025 0 cit.

Data Poisoning Attack

80%