defense 2025

S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Giorgio Piras 1, Qi Zhao 2, Fabio Brau 1, Maura Pintor 1, Christian Wressnegger 2, Battista Biggio 1

0 citations · 34 references · arXiv

α

Published on arXiv

2510.18381

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

S2AP minimizes sharpness in the score-space loss landscape, stabilizing mask selection and improving adversarial robustness of pruned models across various architectures, datasets, and sparsity levels over existing AP baselines.

S2AP (Score-space Sharpness-aware Adversarial Pruning)

Novel technique introduced


Adversarial pruning methods have emerged as a powerful tool for compressing neural networks while preserving robustness against adversarial attacks. These methods typically follow a three-step pipeline: (i) pretrain a robust model, (ii) select a binary mask for weight pruning, and (iii) finetune the pruned model. To select the binary mask, these methods minimize a robust loss by assigning an importance score to each weight, and then keep the weights with the highest scores. However, this score-space optimization can lead to sharp local minima in the robust loss landscape and, in turn, to an unstable mask selection, reducing the robustness of adversarial pruning methods. To overcome this issue, we propose a novel plug-in method for adversarial pruning, termed Score-space Sharpness-aware Adversarial Pruning (S2AP). Through our method, we introduce the concept of score-space sharpness minimization, which operates during the mask search by perturbing importance scores and minimizing the corresponding robust loss. Extensive experiments across various datasets, models, and sparsity levels demonstrate that S2AP effectively minimizes sharpness in score space, stabilizing the mask selection, and ultimately improving the robustness of adversarial pruning methods.


Key Contributions

  • Introduces score-space sharpness minimization concept, perturbing importance scores during mask search to smooth the robust loss landscape
  • Proposes S2AP as a plug-in method seamlessly integrable into existing score-based adversarial pruning pipelines (e.g., HYDRA, HARP) without altering their core logic
  • Demonstrates across multiple architectures, datasets, and sparsity rates that S2AP stabilizes mask selection (measured via Hamming distance) and improves adversarial robustness of pruned models

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial input manipulation attacks by improving adversarial robustness of pruned models; score-space sharpness minimization stabilizes the pruning mask selection to preserve robust loss performance against adversarial examples at inference time.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_time
Datasets
CIFAR-10CIFAR-100Tiny ImageNet
Applications
image classificationneural network compression