attack 2025

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

Shanmin Wang , Dongdong Zhao

Wuhan University of Technology

0 citations · 60 references · arXiv

Published on arXiv

2511.12046

Model Poisoning

OWASP ML Top 10 — ML10

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

BackWeak achieves high attack success rates across diverse student architectures and KD methods using imperceptible weak triggers, without requiring surrogate student models or costly trigger optimization stages used by prior methods.

BackWeak

Novel technique introduced

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks -- most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability, and they construct triggers in a way similar to universal adversarial perturbations (UAPs), which being not stealthy in magnitude, inherently exhibit strong adversarial behavior. This work questions whether such complexity is necessary and constructs stealthy "weak" triggers -- imperceptible perturbations that have negligible adversarial effect. We propose BackWeak, a simple, surrogate-free attack paradigm. BackWeak shows that a powerful backdoor can be implanted by simply fine-tuning a benign teacher with a weak trigger using a very small learning rate. We demonstrate that this delicate fine-tuning is sufficient to embed a backdoor that reliably transfers to diverse student architectures during a victim's standard distillation process, yielding high attack success rates. Extensive empirical evaluations on multiple datasets, model architectures, and KD methods show that BackWeak is efficient, simpler, and often more stealthy than previous elaborate approaches. This work calls on researchers studying KD backdoor attacks to pay particular attention to the trigger's stealthiness and its potential adversarial characteristics.

Key Contributions

Proposes 'weak triggers' — imperceptible perturbations with negligible adversarial effect that nonetheless produce high attack success rates after distillation, challenging the assumption that UAP-like strong triggers are necessary.
Introduces BackWeak, a surrogate-free and lightweight KD backdoor paradigm: fine-tune a benign teacher at a small learning rate to couple the backdoor with the benign task, achieving reliable transferability to diverse student architectures.
Empirically demonstrates that prior KD backdoor methods (ADBA, SCAR) rely heavily on the strong adversarial nature of their UAP-like triggers rather than a genuinely implanted backdoor, and that BackWeak is simpler, more efficient, and more stealthy.

🛡️ Threat Analysis

Transfer Learning Attack

The attack specifically exploits the knowledge distillation (transfer learning) pipeline — the backdoor is engineered to couple with the benign task so it propagates from teacher to diverse student architectures during standard distillation, making KD the primary attack vector.

Model Poisoning

Core contribution is a backdoor injection technique: weak trigger perturbations are embedded into a teacher model via low-LR fine-tuning, creating hidden targeted behavior that activates only on trigger-stamped inputs while remaining dormant on clean data.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_timetargeteddigitalblack_box

Datasets

CIFAR-10CIFAR-100Tiny-ImageNet

Applications

image classificationmodel compressionknowledge distillation

Read PDF arXiv DOI

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks

STONE: Pioneering the One-to-N Universal Backdoor Threat in 3D Point Cloud

DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Hardware-Triggered Backdoors