Trade-off in Estimating the Number of Byzantine Clients in Federated Learning
Ziyi Chen , Su Zhang , Heng Huang
Published on arXiv
2510.04432
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Optimal error bounds for Byzantine-robust federated learning are proportional to f̂/(n−f−f̂), monotonically increasing with f̂, revealing that over-estimating Byzantine clients provably degrades performance even in benign settings.
Federated learning has attracted increasing attention at recent large-scale optimization and machine learning research and applications, but is also vulnerable to Byzantine clients that can send any erroneous signals. Robust aggregators are commonly used to resist Byzantine clients. This usually requires to estimate the unknown number $f$ of Byzantine clients, and thus accordingly select the aggregators with proper degree of robustness (i.e., the maximum number $\hat{f}$ of Byzantine clients allowed by the aggregator). Such an estimation should have important effect on the performance, which has not been systematically studied to our knowledge. This work will fill in the gap by theoretically analyzing the worst-case error of aggregators as well as its induced federated learning algorithm for any cases of $\hat{f}$ and $f$. Specifically, we will show that underestimation ($\hat{f}<f$) can lead to arbitrarily poor performance for both aggregators and federated learning. For non-underestimation ($\hat{f}\ge f$), we have proved optimal lower and upper bounds of the same order on the errors of both aggregators and federated learning. All these optimal bounds are proportional to $\hat{f}/(n-f-\hat{f})$ with $n$ clients, which monotonically increases with larger $\hat{f}$. This indicates a fundamental trade-off: while an aggregator with a larger robustness degree $\hat{f}$ can solve federated learning problems of wider range $f\in [0,\hat{f}]$, the performance can deteriorate when there are actually fewer or even no Byzantine clients (i.e., $f\in [0,\hat{f})$).
Key Contributions
- First systematic theoretical analysis of how the estimated Byzantine client count f̂ affects robust aggregator and federated learning performance for all combinations of f̂ and f
- Proves that underestimation (f̂ < f) leads to arbitrarily poor performance, while non-underestimation yields tight optimal lower/upper bounds proportional to f̂/(n−f−f̂)
- Identifies a fundamental trade-off: larger robustness degree f̂ extends coverage to more Byzantine scenarios but degrades performance when actual Byzantine count is lower
🛡️ Threat Analysis
Focuses on Byzantine attacks in federated learning where malicious clients send arbitrary erroneous model updates to degrade global model performance — explicitly listed under ML02. The paper analyzes the theoretical properties of Byzantine-fault-tolerant aggregation protocols (robust aggregators such as geometric median, trimmed mean, Krum), which are ML02 defenses.