Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two phases: (i) in the backdoor training phase, the backdoor functionality is injected into specific weights that will be retained during the pruning process; (ii) in the backdoor hiding phase, the malicious behavior is concealed by fine-tuning elements that will be pruned away. This dual-phase approach ensures that the attack remains undetectable in the released model, but activates properly once the model is pruned with the semi-structured sparsity. Through extensive experiments, we show that our attack successfully threatens the semi-structured sparsity algorithms from both NVIDIA and PyTorch. Our empirical results show that, regardless of model architecture, the attack success rate of the released model remains below 10% prior to sparsification but exceeds 99% afterward. Moreover, we demonstrate that SUS attack is robust against state-of-the-art backdoor defenses and finetuning, highlighting a critical vulnerability in current model compression and deployment pipelines.

Key Contributions

SUS: a compression-activated backdoor tailored to 2:4 semi-structured sparsity that embeds malicious behavior in weights retained after pruning while hiding it in weights that will be pruned away
Formal guarantees that the backdoor activates post-sparsification, validated across both NVIDIA Sparse Tensor Cores (hardware) and PyTorch (software) pipelines
Demonstrated robustness against state-of-the-art backdoor defenses and user-side fine-tuning, with ASR below 10% before pruning and above 99% after

🛡️ Threat Analysis

Model Poisoning

SUS is a backdoor/trojan attack that embeds hidden, trigger-activated malicious behavior into model weights during training. The contribution is the novel injection technique — exploiting the predictable structure of 2:4 pruning masks to embed the backdoor in retained weights while hiding it in pruned ones. The model-hub distribution framing is motivation/context, but the primary contribution is the backdoor technique itself, not a supply chain compromise method.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_timetargeteddigitalgrey_box

Datasets

CIFAR-10

Applications

2025 0 cit.

Model Poisoning

85%