defense 2026

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Xingli Fang , Jung-Eun Kim

0 citations

α

Published on arXiv

2603.13186

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Achieves better resilience against membership inference attacks than prior defense mechanisms while maintaining comparable utility by retraining only privacy-vulnerable weights

Weight Rewinding for Privacy

Novel technique introduced


Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.


Key Contributions

  • Identifies that privacy vulnerability exists in only a small fraction of neural network weights
  • Discovers that weight importance stems from location rather than values
  • Proposes selective weight rewinding mechanism that fine-tunes only critical privacy-vulnerable weights instead of full retraining

🛡️ Threat Analysis

Membership Inference Attack

Paper directly addresses membership inference attacks (MIAs) as the primary threat and proposes a defense that selectively retrains weights to reduce privacy vulnerability while maintaining utility.


Details

Domains
vision
Model Types
cnntraditional_ml
Threat Tags
training_time
Applications
image classification