Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation

Partial participation is essential for communication-efficient federated learning at scale, yet existing Byzantine-robust methods typically assume full client participation. In the partial participation setting, a majority of the sampled clients may be Byzantine, once Byzantine clients dominate, existing methods break down immediately. We introduce delayed momentum aggregation, a principle where the central server aggregates cached momentum from non-sampled clients along with fresh momentum from sampled clients. This principle ensures Byzantine clients remain a minority from the server's perspective even when they dominate the sampled set. We instantiate this principle in our optimizer DeMoA. We analyze the convergence rate of DeMoA, showing that DeMoA is Byzantine-robust under partial participation. Experiments show that, with 20% Byzantine ratio and only 10% partial participation rate, DeMoA achieves the best accuracy even when existing methods fail empirically.

Key Contributions

Delayed momentum aggregation principle: server caches momentum from non-sampled clients to ensure Byzantine clients are always a statistical minority, even when they dominate the sampled set
DeMoA optimizer instantiating this principle with convergence guarantees for Byzantine robustness under partial participation
Empirical demonstration that DeMoA maintains accuracy at 20% Byzantine ratio and 10% participation rate where FedAvg, FedCM, and Byz-VR-MARINA-PP fail

🛡️ Threat Analysis

Data Poisoning Attack

Byzantine clients in federated learning send arbitrary/adversarial model updates to degrade global model performance — this is model-level poisoning via malicious participants. The paper proposes DeMoA, a Byzantine-fault-tolerant aggregation defense. Explicitly matches 'Byzantine attacks in federated learning' and 'Byzantine-fault-tolerant FL protocols' under ML02.

Details

Domains

federated-learning

Model Types

federatedcnn

Threat Tags

training_timeuntargeted

Datasets

MNISTCIFAR-10

Applications

2026 0 cit.

Data Poisoning Attack

83%

Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks

H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning

RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Adaptive Decentralized Federated Learning for Robust Optimization

TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints