DEFEND: Poisoned Model Detection and Malicious Client Exclusion Mechanism for Secure Federated Learning-based Road Condition Classification
Sheng Liu , Panos Papadimitratos
Published on arXiv
2512.06172
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
DEFEND outperforms seven baseline countermeasures by at least 15.78% and achieves the same model performance under TLFA as in completely attack-free scenarios
DEFEND
Novel technique introduced
Federated Learning (FL) has drawn the attention of the Intelligent Transportation Systems (ITS) community. FL can train various models for ITS tasks, notably camera-based Road Condition Classification (RCC), in a privacy-preserving collaborative way. However, opening up to collaboration also opens FL-based RCC systems to adversaries, i.e., misbehaving participants that can launch Targeted Label-Flipping Attacks (TLFAs) and threaten transportation safety. Adversaries mounting TLFAs poison training data to misguide model predictions, from an actual source class (e.g., wet road) to a wrongly perceived target class (e.g., dry road). Existing countermeasures against poisoning attacks cannot maintain model performance under TLFAs close to the performance level in attack-free scenarios, because they lack specific model misbehavior detection for TLFAs and neglect client exclusion after the detection. To close this research gap, we propose DEFEND, which includes a poisoned model detection strategy that leverages neuron-wise magnitude analysis for attack goal identification and Gaussian Mixture Model (GMM)-based clustering. DEFEND discards poisoned model contributions in each round and adapts accordingly client ratings, eventually excluding malicious clients. Extensive evaluation involving various FL-RCC models and tasks shows that DEFEND can thwart TLFAs and outperform seven baseline countermeasures, with at least 15.78% improvement, with DEFEND remarkably achieving under attack the same performance as in attack-free scenarios.
Key Contributions
- Neuron-wise magnitude analysis for identifying the attack goal (source-to-target class flip) of TLFAs in FL model updates
- GMM-based clustering to distinguish poisoned from benign model contributions each aggregation round
- Adaptive client rating and permanent malicious-client exclusion mechanism that achieves attack-free performance even under active TLFA
🛡️ Threat Analysis
Directly defends against Targeted Label-Flipping Attacks (TLFAs) in federated learning, where malicious clients poison training data by flipping source-class labels to a target class (e.g., wet road → dry road). The threat model is Byzantine FL participants corrupting the global model via poisoned data contributions — core ML02 territory. DEFEND detects poisoned model updates and excludes malicious clients using neuron-wise magnitude analysis and GMM clustering.