Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity
Wei Guo , Fabio Brau , Maura Pintor , Ambra Demontis , Battista Biggio
Published on arXiv
2509.08747
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
SUS attack success rate remains below 10% in the released dense model but exceeds 99% after 2:4 semi-structured sparsification, regardless of model architecture.
Silent Until Sparse (SUS)
Novel technique introduced
In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two phases: (i) in the backdoor training phase, the backdoor functionality is injected into specific weights that will be retained during the pruning process; (ii) in the backdoor hiding phase, the malicious behavior is concealed by fine-tuning elements that will be pruned away. This dual-phase approach ensures that the attack remains undetectable in the released model, but activates properly once the model is pruned with the semi-structured sparsity. Through extensive experiments, we show that our attack successfully threatens the semi-structured sparsity algorithms from both NVIDIA and PyTorch. Our empirical results show that, regardless of model architecture, the attack success rate of the released model remains below 10% prior to sparsification but exceeds 99% afterward. Moreover, we demonstrate that SUS attack is robust against state-of-the-art backdoor defenses and finetuning, highlighting a critical vulnerability in current model compression and deployment pipelines.
Key Contributions
- SUS: a compression-activated backdoor tailored to 2:4 semi-structured sparsity that embeds malicious behavior in weights retained after pruning while hiding it in weights that will be pruned away
- Formal guarantees that the backdoor activates post-sparsification, validated across both NVIDIA Sparse Tensor Cores (hardware) and PyTorch (software) pipelines
- Demonstrated robustness against state-of-the-art backdoor defenses and user-side fine-tuning, with ASR below 10% before pruning and above 99% after
🛡️ Threat Analysis
SUS is a backdoor/trojan attack that embeds hidden, trigger-activated malicious behavior into model weights during training. The contribution is the novel injection technique — exploiting the predictable structure of 2:4 pruning masks to embed the backdoor in retained weights while hiding it in pruned ones. The model-hub distribution framing is motivation/context, but the primary contribution is the backdoor technique itself, not a supply chain compromise method.