Defending Against Beta Poisoning Attacks in Machine Learning Models
Nilufer Gulciftci 1, M. Emre Gursoy 2
Published on arXiv
2508.01276
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
KPB and MDT achieve perfect accuracy and F1 scores (1.0) on both MNIST and CIFAR-10 against Beta Poisoning; CBD and NCC also provide strong but slightly lower performance.
KPB / NCC / CBD / MDT
Novel technique introduced
Poisoning attacks, in which an attacker adversarially manipulates the training dataset of a machine learning (ML) model, pose a significant threat to ML security. Beta Poisoning is a recently proposed poisoning attack that disrupts model accuracy by making the training dataset linearly nonseparable. In this paper, we propose four defense strategies against Beta Poisoning attacks: kNN Proximity-Based Defense (KPB), Neighborhood Class Comparison (NCC), Clustering-Based Defense (CBD), and Mean Distance Threshold (MDT). The defenses are based on our observations regarding the characteristics of poisoning samples generated by Beta Poisoning, e.g., poisoning samples have close proximity to one another, and they are centered near the mean of the target class. Experimental evaluations using MNIST and CIFAR-10 datasets demonstrate that KPB and MDT can achieve perfect accuracy and F1 scores, while CBD and NCC also provide strong defensive capabilities. Furthermore, by analyzing performance across varying parameters, we offer practical insights regarding defenses' behaviors under varying conditions.
Key Contributions
- Empirical analysis of Beta Poisoning sample characteristics: high mutual proximity and clustering near target class mean
- Four complementary defenses (KPB, NCC, CBD, MDT) that exploit these structural properties to detect and filter poisoning samples
- Experimental evaluation on MNIST and CIFAR-10 showing KPB and MDT achieve perfect accuracy and F1 scores (1.0) across all conditions
🛡️ Threat Analysis
Paper directly defends against Beta Poisoning, a data poisoning attack that injects maliciously crafted training samples to make the dataset linearly nonseparable and degrade model accuracy. All four proposed defenses (KPB, NCC, CBD, MDT) are data sanitization methods targeting training-time poisoning — the canonical ML02 threat.