defense 2025

DEFEND: Poisoned Model Detection and Malicious Client Exclusion Mechanism for Secure Federated Learning-based Road Condition Classification

Sheng Liu , Panos Papadimitratos

0 citations · 43 references · arXiv

α

Published on arXiv

2512.06172

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

DEFEND outperforms seven baseline countermeasures by at least 15.78% and achieves the same model performance under TLFA as in completely attack-free scenarios

DEFEND

Novel technique introduced


Federated Learning (FL) has drawn the attention of the Intelligent Transportation Systems (ITS) community. FL can train various models for ITS tasks, notably camera-based Road Condition Classification (RCC), in a privacy-preserving collaborative way. However, opening up to collaboration also opens FL-based RCC systems to adversaries, i.e., misbehaving participants that can launch Targeted Label-Flipping Attacks (TLFAs) and threaten transportation safety. Adversaries mounting TLFAs poison training data to misguide model predictions, from an actual source class (e.g., wet road) to a wrongly perceived target class (e.g., dry road). Existing countermeasures against poisoning attacks cannot maintain model performance under TLFAs close to the performance level in attack-free scenarios, because they lack specific model misbehavior detection for TLFAs and neglect client exclusion after the detection. To close this research gap, we propose DEFEND, which includes a poisoned model detection strategy that leverages neuron-wise magnitude analysis for attack goal identification and Gaussian Mixture Model (GMM)-based clustering. DEFEND discards poisoned model contributions in each round and adapts accordingly client ratings, eventually excluding malicious clients. Extensive evaluation involving various FL-RCC models and tasks shows that DEFEND can thwart TLFAs and outperform seven baseline countermeasures, with at least 15.78% improvement, with DEFEND remarkably achieving under attack the same performance as in attack-free scenarios.


Key Contributions

  • Neuron-wise magnitude analysis for identifying the attack goal (source-to-target class flip) of TLFAs in FL model updates
  • GMM-based clustering to distinguish poisoned from benign model contributions each aggregation round
  • Adaptive client rating and permanent malicious-client exclusion mechanism that achieves attack-free performance even under active TLFA

🛡️ Threat Analysis

Data Poisoning Attack

Directly defends against Targeted Label-Flipping Attacks (TLFAs) in federated learning, where malicious clients poison training data by flipping source-class labels to a target class (e.g., wet road → dry road). The threat model is Byzantine FL participants corrupting the global model via poisoned data contributions — core ML02 territory. DEFEND detects poisoned model updates and excludes malicious clients using neuron-wise magnitude analysis and GMM clustering.


Details

Domains
visionfederated-learning
Model Types
cnnfederated
Threat Tags
training_timetargeted
Applications
road condition classificationfederated learningintelligent transportation systemsautonomous vehicles