FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning
Abolfazl Younesi , Leon Kiss , Zahra Najafabadi Samani , Juan Aznar Poveda , Thomas Fahringer
Published on arXiv
2511.14715
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
FLARE improves robustness by up to 16% over state-of-the-art Byzantine-robust baselines and preserves convergence within 30% of the non-attacked baseline across 100 clients on MNIST, CIFAR-10, and SVHN.
FLARE
Novel technique introduced
Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE
Key Contributions
- FLARE: a multi-dimensional, adaptive reputation framework for FL that replaces binary client exclusion with continuous trust scoring across performance, statistical-anomaly, and temporal dimensions
- Self-calibrating adaptive threshold that adjusts security strictness based on model convergence and recent attack intensity, with reputation-weighted soft exclusion and an integrated Local Differential Privacy mechanism
- Statistical Mimicry (SM) attack — a novel evasive adversary that blends honest gradients with synthetic perturbations and persistent drift to evade traditional statistical filters
🛡️ Threat Analysis
Core threat model is malicious FL clients corrupting training via Byzantine attacks (gradient scaling, ALIE), label flipping, and the novel Statistical Mimicry poisoning attack; FLARE defends through reputation-weighted aggregation and soft exclusion of suspicious client updates at training time.