Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning

Active learning(AL), which serves as the representative label-efficient learning paradigm, has been widely applied in resource-constrained scenarios. The achievement of AL is attributed to acquisition functions, which are designed for identifying the most important data to label. Despite this success, one question remains unanswered: is AL safe? In this work, we introduce ALA, a practical and the first framework to utilize the acquisition function as the poisoning attack surface to reveal the weakness of active learning. Specifically, ALA optimizes imperceptibly poisoned inputs to exhibit high uncertainty scores, increasing their probability of being selected by acquisition functions. To evaluate ALA, we conduct extensive experiments across three datasets, three acquisition functions, and two types of clean-label backdoor triggers. Results show that our attack can achieve high success rates (up to 94%) even under low poisoning budgets (0.5%-1.0%) while preserving model utility and remaining undetectable to human annotators. Our findings remind active learning users: acquisition functions can be easily exploited, and active learning should be deployed with caution in trusted data scenarios.

Key Contributions

First framework (ALA) to identify and exploit active learning acquisition functions as a poisoning attack surface for backdoor injection.
Selection-aware optimization algorithm that maximizes uncertainty scores of poisoned samples to increase their probability of being selected by acquisition functions.
Comprehensive evaluation across Fashion-MNIST, CIFAR-10, and SVHN showing up to 94% backdoor attack success rate at 0.5%–1.0% poisoning budgets while preserving model utility and evading human annotators.

🛡️ Threat Analysis

Data Poisoning Attack

The attack's novelty lies in a selection-aware data poisoning strategy that exploits the AL acquisition function — a new attack surface for ensuring poisoned data enters the training set — constituting a genuine data poisoning contribution beyond the backdoor itself.

Model Poisoning

Primary contribution is a clean-label backdoor injection attack using CL and SIG triggers to embed hidden trigger-based targeted misclassification behavior into models trained with active learning.