defense 2025

SecureLearn -- An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

Anum Paracha , Junaid Arshad , Mohamed Ben Farah , Khalid Ismail

0 citations · 46 references · arXiv

α

Published on arXiv

2510.22274

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

SecureLearn maintains accuracy above 90% and reduces false discovery rate to 0.06 across all evaluated models, achieving at least 97% recall and F1-score for neural networks against all tested poisoning attacks.

SecureLearn (FORT)

Novel technique introduced


Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance. Existing defenses are mostly designed to mitigate specific poisoning attacks or are aligned with particular ML algorithms. Furthermore, most defenses are developed to secure deep neural networks or binary classifiers. However, traditional multiclass classifiers need attention to be secure from data poisoning attacks, as these models are significant in developing multi-modal applications. Therefore, this paper proposes SecureLearn, a two-layer attack-agnostic defense to defend multiclass models from poisoning attacks. It comprises two components of data sanitization and a new feature-oriented adversarial training. To ascertain the effectiveness of SecureLearn, we proposed a 3D evaluation matrix with three orthogonal dimensions: data poisoning attack, data sanitization and adversarial training. Benchmarking SecureLearn in a 3D matrix, a detailed analysis is conducted at different poisoning levels (10%-20%), particularly analysing accuracy, recall, F1-score, detection and correction rates, and false discovery rate. The experimentation is conducted for four ML algorithms, namely Random Forest (RF), Decision Tree (DT), Gaussian Naive Bayes (GNB) and Multilayer Perceptron (MLP), trained with three public datasets, against three poisoning attacks and compared with two existing mitigations. Our results highlight that SecureLearn is effective against the provided attacks. SecureLearn has strengthened resilience and adversarial robustness of traditional multiclass models and neural networks, confirming its generalization beyond algorithm-specific defenses. It consistently maintained accuracy above 90%, recall and F1-score above 75%. For neural networks, SecureLearn achieved 97% recall and F1-score against all selected poisoning attacks.


Key Contributions

  • SecureLearn: a two-layer attack-agnostic defense combining nearest-neighbor-based data sanitization with a novel Feature-Oriented adversarial tRaining (FORT) technique applicable to any ML algorithm
  • A 3D evaluation matrix for comprehensive benchmarking across three orthogonal dimensions: poisoning attack type, data sanitization, and adversarial training
  • Empirical validation across four ML algorithms (RF, DT, GNB, MLP), three datasets (IRIS, MNIST, USPS), and three poisoning attacks at 10–20% poisoning levels

🛡️ Threat Analysis

Data Poisoning Attack

The paper's entire contribution is defending against data poisoning attacks (label-flipping, subpopulation, outlier-oriented poisoning) on multiclass ML classifiers, with data sanitization and feature-oriented adversarial training (FORT) as the two defense layers.


Details

Domains
visiontabular
Model Types
traditional_ml
Threat Tags
training_timeuntargeted
Datasets
IRISMNISTUSPS
Applications
multiclass classificationtraditional ml classifiers