Prototype-Guided Robust Learning against Backdoor Attacks

Backdoor attacks poison the training data to embed a backdoor in the model, causing it to behave normally on legitimate inputs but maliciously when specific trigger signals appear. Training a benign model from a dataset poisoned by backdoor attacks is challenging. Existing works rely on various assumptions and can only defend against backdoor attacks with specific trigger signals, high poisoning ratios, or when the defender possesses a large, untainted, validation dataset. In this paper, we propose a defense called Prototype-Guided Robust Learning (PGRL), which overcomes all the aforementioned limitations, being robust against diverse backdoor attacks. Leveraging a tiny set of benign samples, PGRL generates prototype vectors to guide the training process. We compare our PGRL with 8 existing defenses, showing that it achieves superior robustness. We also demonstrate that PGRL generalizes well across various architectures, datasets, and advanced attacks. Finally, to evaluate our PGRL in the worst-case scenario, we perform an adaptive attack, where the attackers fully know the details of the defense.

Key Contributions

PGRL defense that generates prototype vectors from a tiny benign sample set to guide robust training against backdoor attacks
Demonstrates failure of existing FPF-based defenses against low poisoning ratios and cover-sample attacks, motivating a new assumption-free approach
Evaluation against 8 baselines across diverse architectures, datasets, and attack types, including a white-box adaptive attacker scenario

🛡️ Threat Analysis

Model Poisoning

Paper directly defends against trigger-based backdoor attacks embedded via training data poisoning; PGRL uses prototype vectors to prevent hidden targeted behavior from being learned during training.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_timetargetedblack_box

Datasets

CIFAR-10

Applications

2026 0 cit.

Model Poisoning

77%