defense 2026

Sequential Subspace Noise Injection Prevents Accuracy Collapse in Certified Unlearning

Polina Dolgova 1,2, Sebastian U. Stich 1

0 citations · 30 references · arXiv

α

Published on arXiv

2601.05134

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Block-wise noise injection prevents the near-total accuracy collapse (98% → <20%) seen with standard noisy fine-tuning on CIFAR-10/ResNet-18 while retaining the same (ε,δ) certified forgetting guarantee and MIA robustness.

Sequential Subspace Noise Injection

Novel technique introduced


Certified unlearning based on differential privacy offers strong guarantees but remains largely impractical: the noisy fine-tuning approaches proposed so far achieve these guarantees but severely reduce model accuracy. We propose sequential noise scheduling, which distributes the noise budget across orthogonal subspaces of the parameter space, rather than injecting it all at once. This simple modification mitigates the destructive effect of noise while preserving the original certification guarantees. We extend the analysis of noisy fine-tuning to the subspace setting, proving that the same $(\varepsilon,δ)$ privacy budget is retained. Empirical results on image classification benchmarks show that our approach substantially improves accuracy after unlearning while remaining robust to membership inference attacks. These results show that certified unlearning can achieve both rigorous guarantees and practical utility.


Key Contributions

  • Sequential Subspace Noise Injection: partitions parameter space into orthogonal blocks and applies noise sequentially per block, reducing per-step distortion versus simultaneous full-model noise injection.
  • Theoretical extension proving the block-wise schedule preserves the same (ε,δ) certified unlearning budget as standard noisy fine-tuning.
  • Empirical demonstration on MNIST and CIFAR-10 showing substantially reduced post-unlearning accuracy drop while maintaining robustness to membership inference attacks on the forgotten data.

🛡️ Threat Analysis

Membership Inference Attack

The paper's empirical evaluation explicitly validates that after unlearning, forgotten data points cannot be detected via membership inference attacks — MIA robustness is presented as a primary result alongside accuracy preservation, not merely a passing mention.


Details

Domains
vision
Model Types
cnn
Threat Tags
training_time
Datasets
MNISTCIFAR-10
Applications
image classification