Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning
Yuhan Zhi 1, Longtian Wang 1, Xiaofei Xie 2, Chao Shen 1, Qiang Hu 3, Xiaohong Guan 1
Published on arXiv
2508.05681
Model Poisoning
OWASP ML Top 10 — ML10
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Achieves up to 94% backdoor attack success rate at a 0.5%–1.0% poisoning budget by exploiting uncertainty-based acquisition functions in active learning pipelines.
ALA
Novel technique introduced
Active learning(AL), which serves as the representative label-efficient learning paradigm, has been widely applied in resource-constrained scenarios. The achievement of AL is attributed to acquisition functions, which are designed for identifying the most important data to label. Despite this success, one question remains unanswered: is AL safe? In this work, we introduce ALA, a practical and the first framework to utilize the acquisition function as the poisoning attack surface to reveal the weakness of active learning. Specifically, ALA optimizes imperceptibly poisoned inputs to exhibit high uncertainty scores, increasing their probability of being selected by acquisition functions. To evaluate ALA, we conduct extensive experiments across three datasets, three acquisition functions, and two types of clean-label backdoor triggers. Results show that our attack can achieve high success rates (up to 94%) even under low poisoning budgets (0.5%-1.0%) while preserving model utility and remaining undetectable to human annotators. Our findings remind active learning users: acquisition functions can be easily exploited, and active learning should be deployed with caution in trusted data scenarios.
Key Contributions
- First framework (ALA) to identify and exploit active learning acquisition functions as a poisoning attack surface for backdoor injection.
- Selection-aware optimization algorithm that maximizes uncertainty scores of poisoned samples to increase their probability of being selected by acquisition functions.
- Comprehensive evaluation across Fashion-MNIST, CIFAR-10, and SVHN showing up to 94% backdoor attack success rate at 0.5%–1.0% poisoning budgets while preserving model utility and evading human annotators.
🛡️ Threat Analysis
The attack's novelty lies in a selection-aware data poisoning strategy that exploits the AL acquisition function — a new attack surface for ensuring poisoned data enters the training set — constituting a genuine data poisoning contribution beyond the backdoor itself.
Primary contribution is a clean-label backdoor injection attack using CL and SIG triggers to embed hidden trigger-based targeted misclassification behavior into models trained with active learning.