Defending Against Beta Poisoning Attacks in Machine Learning Models

Poisoning attacks, in which an attacker adversarially manipulates the training dataset of a machine learning (ML) model, pose a significant threat to ML security. Beta Poisoning is a recently proposed poisoning attack that disrupts model accuracy by making the training dataset linearly nonseparable. In this paper, we propose four defense strategies against Beta Poisoning attacks: kNN Proximity-Based Defense (KPB), Neighborhood Class Comparison (NCC), Clustering-Based Defense (CBD), and Mean Distance Threshold (MDT). The defenses are based on our observations regarding the characteristics of poisoning samples generated by Beta Poisoning, e.g., poisoning samples have close proximity to one another, and they are centered near the mean of the target class. Experimental evaluations using MNIST and CIFAR-10 datasets demonstrate that KPB and MDT can achieve perfect accuracy and F1 scores, while CBD and NCC also provide strong defensive capabilities. Furthermore, by analyzing performance across varying parameters, we offer practical insights regarding defenses' behaviors under varying conditions.

Key Contributions

Empirical analysis of Beta Poisoning sample characteristics: high mutual proximity and clustering near target class mean
Four complementary defenses (KPB, NCC, CBD, MDT) that exploit these structural properties to detect and filter poisoning samples
Experimental evaluation on MNIST and CIFAR-10 showing KPB and MDT achieve perfect accuracy and F1 scores (1.0) across all conditions

🛡️ Threat Analysis

Data Poisoning Attack

Paper directly defends against Beta Poisoning, a data poisoning attack that injects maliciously crafted training samples to make the dataset linearly nonseparable and degrade model accuracy. All four proposed defenses (KPB, NCC, CBD, MDT) are data sanitization methods targeting training-time poisoning — the canonical ML02 threat.

Details

Domains

vision

Model Types

traditional_mlcnn

Threat Tags

training_timedigitaluntargeted

Datasets

MNISTCIFAR-10

Applications

2025 0 cit.

Data Poisoning Attack

67%