attack 2025

Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

Wei Guo , Fabio Brau , Maura Pintor , Ambra Demontis , Battista Biggio

0 citations

α

Published on arXiv

2509.08747

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

SUS attack success rate remains below 10% in the released dense model but exceeds 99% after 2:4 semi-structured sparsification, regardless of model architecture.

Silent Until Sparse (SUS)

Novel technique introduced


In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two phases: (i) in the backdoor training phase, the backdoor functionality is injected into specific weights that will be retained during the pruning process; (ii) in the backdoor hiding phase, the malicious behavior is concealed by fine-tuning elements that will be pruned away. This dual-phase approach ensures that the attack remains undetectable in the released model, but activates properly once the model is pruned with the semi-structured sparsity. Through extensive experiments, we show that our attack successfully threatens the semi-structured sparsity algorithms from both NVIDIA and PyTorch. Our empirical results show that, regardless of model architecture, the attack success rate of the released model remains below 10% prior to sparsification but exceeds 99% afterward. Moreover, we demonstrate that SUS attack is robust against state-of-the-art backdoor defenses and finetuning, highlighting a critical vulnerability in current model compression and deployment pipelines.


Key Contributions

  • SUS: a compression-activated backdoor tailored to 2:4 semi-structured sparsity that embeds malicious behavior in weights retained after pruning while hiding it in weights that will be pruned away
  • Formal guarantees that the backdoor activates post-sparsification, validated across both NVIDIA Sparse Tensor Cores (hardware) and PyTorch (software) pipelines
  • Demonstrated robustness against state-of-the-art backdoor defenses and user-side fine-tuning, with ASR below 10% before pruning and above 99% after

🛡️ Threat Analysis

Model Poisoning

SUS is a backdoor/trojan attack that embeds hidden, trigger-activated malicious behavior into model weights during training. The contribution is the novel injection technique — exploiting the predictable structure of 2:4 pruning masks to embed the backdoor in retained weights while hiding it in pruned ones. The model-hub distribution framing is motivation/context, but the primary contribution is the backdoor technique itself, not a supply chain compromise method.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timetargeteddigitalgrey_box
Datasets
CIFAR-10
Applications
image classification