S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Adversarial pruning methods have emerged as a powerful tool for compressing neural networks while preserving robustness against adversarial attacks. These methods typically follow a three-step pipeline: (i) pretrain a robust model, (ii) select a binary mask for weight pruning, and (iii) finetune the pruned model. To select the binary mask, these methods minimize a robust loss by assigning an importance score to each weight, and then keep the weights with the highest scores. However, this score-space optimization can lead to sharp local minima in the robust loss landscape and, in turn, to an unstable mask selection, reducing the robustness of adversarial pruning methods. To overcome this issue, we propose a novel plug-in method for adversarial pruning, termed Score-space Sharpness-aware Adversarial Pruning (S2AP). Through our method, we introduce the concept of score-space sharpness minimization, which operates during the mask search by perturbing importance scores and minimizing the corresponding robust loss. Extensive experiments across various datasets, models, and sparsity levels demonstrate that S2AP effectively minimizes sharpness in score space, stabilizing the mask selection, and ultimately improving the robustness of adversarial pruning methods.

Key Contributions

Introduces score-space sharpness minimization concept, perturbing importance scores during mask search to smooth the robust loss landscape
Proposes S2AP as a plug-in method seamlessly integrable into existing score-based adversarial pruning pipelines (e.g., HYDRA, HARP) without altering their core logic
Demonstrates across multiple architectures, datasets, and sparsity rates that S2AP stabilizes mask selection (measured via Hamming distance) and improves adversarial robustness of pruned models

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial input manipulation attacks by improving adversarial robustness of pruned models; score-space sharpness minimization stabilizes the pruning mask selection to preserve robust loss performance against adversarial examples at inference time.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_time

Datasets

CIFAR-10CIFAR-100Tiny ImageNet

Applications

2025 0 cit.

Input Manipulation Attack

91%