Prototype-Guided Robust Learning against Backdoor Attacks
Wei Guo , Maura Pintor , Ambra Demontis , Battista Biggio
Published on arXiv
2509.08748
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
PGRL achieves superior robustness compared to 8 existing defenses across diverse backdoor attacks including cover-sample and low-poisoning-ratio settings.
PGRL (Prototype-Guided Robust Learning)
Novel technique introduced
Backdoor attacks poison the training data to embed a backdoor in the model, causing it to behave normally on legitimate inputs but maliciously when specific trigger signals appear. Training a benign model from a dataset poisoned by backdoor attacks is challenging. Existing works rely on various assumptions and can only defend against backdoor attacks with specific trigger signals, high poisoning ratios, or when the defender possesses a large, untainted, validation dataset. In this paper, we propose a defense called Prototype-Guided Robust Learning (PGRL), which overcomes all the aforementioned limitations, being robust against diverse backdoor attacks. Leveraging a tiny set of benign samples, PGRL generates prototype vectors to guide the training process. We compare our PGRL with 8 existing defenses, showing that it achieves superior robustness. We also demonstrate that PGRL generalizes well across various architectures, datasets, and advanced attacks. Finally, to evaluate our PGRL in the worst-case scenario, we perform an adaptive attack, where the attackers fully know the details of the defense.
Key Contributions
- PGRL defense that generates prototype vectors from a tiny benign sample set to guide robust training against backdoor attacks
- Demonstrates failure of existing FPF-based defenses against low poisoning ratios and cover-sample attacks, motivating a new assumption-free approach
- Evaluation against 8 baselines across diverse architectures, datasets, and attack types, including a white-box adaptive attacker scenario
🛡️ Threat Analysis
Paper directly defends against trigger-based backdoor attacks embedded via training data poisoning; PGRL uses prototype vectors to prevent hidden targeted behavior from being learned during training.