S2AP: Score-space Sharpness Minimization for Adversarial Pruning
Giorgio Piras 1, Qi Zhao 2, Fabio Brau 1, Maura Pintor 1, Christian Wressnegger 2, Battista Biggio 1
Published on arXiv
2510.18381
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
S2AP minimizes sharpness in the score-space loss landscape, stabilizing mask selection and improving adversarial robustness of pruned models across various architectures, datasets, and sparsity levels over existing AP baselines.
S2AP (Score-space Sharpness-aware Adversarial Pruning)
Novel technique introduced
Adversarial pruning methods have emerged as a powerful tool for compressing neural networks while preserving robustness against adversarial attacks. These methods typically follow a three-step pipeline: (i) pretrain a robust model, (ii) select a binary mask for weight pruning, and (iii) finetune the pruned model. To select the binary mask, these methods minimize a robust loss by assigning an importance score to each weight, and then keep the weights with the highest scores. However, this score-space optimization can lead to sharp local minima in the robust loss landscape and, in turn, to an unstable mask selection, reducing the robustness of adversarial pruning methods. To overcome this issue, we propose a novel plug-in method for adversarial pruning, termed Score-space Sharpness-aware Adversarial Pruning (S2AP). Through our method, we introduce the concept of score-space sharpness minimization, which operates during the mask search by perturbing importance scores and minimizing the corresponding robust loss. Extensive experiments across various datasets, models, and sparsity levels demonstrate that S2AP effectively minimizes sharpness in score space, stabilizing the mask selection, and ultimately improving the robustness of adversarial pruning methods.
Key Contributions
- Introduces score-space sharpness minimization concept, perturbing importance scores during mask search to smooth the robust loss landscape
- Proposes S2AP as a plug-in method seamlessly integrable into existing score-based adversarial pruning pipelines (e.g., HYDRA, HARP) without altering their core logic
- Demonstrates across multiple architectures, datasets, and sparsity rates that S2AP stabilizes mask selection (measured via Hamming distance) and improves adversarial robustness of pruned models
🛡️ Threat Analysis
Directly defends against adversarial input manipulation attacks by improving adversarial robustness of pruned models; score-space sharpness minimization stabilizes the pruning mask selection to preserve robust loss performance against adversarial examples at inference time.