Auditing Approximate Machine Unlearning for Differentially Private Models
Yuechun Gu , Jiajie He , Keke Chen
Published on arXiv
2508.18671
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Existing approximate machine unlearning algorithms fail to maintain the DP privacy budget for retained samples, violating the privacy onion effect assumptions that underlie them.
A-LiRA
Novel technique introduced
Approximate machine unlearning aims to remove the effect of specific data from trained models to ensure individuals' privacy. Existing methods focus on the removed records and assume the retained ones are unaffected. However, recent studies on the \emph{privacy onion effect} indicate this assumption might be incorrect. Especially when the model is differentially private, no study has explored whether the retained ones still meet the differential privacy (DP) criterion under existing machine unlearning methods. This paper takes a holistic approach to auditing both unlearned and retained samples' privacy risks after applying approximate unlearning algorithms. We propose the privacy criteria for unlearned and retained samples, respectively, based on the perspectives of DP and membership inference attacks (MIAs). To make the auditing process more practical, we also develop an efficient MIA, A-LiRA, utilizing data augmentation to reduce the cost of shadow model training. Our experimental findings indicate that existing approximate machine unlearning algorithms may inadvertently compromise the privacy of retained samples for differentially private models, and we need differentially private unlearning algorithms. For reproducibility, we have pubished our code: https://anonymous.4open.science/r/Auditing-machine-unlearning-CB10/README.md
Key Contributions
- Reformulates privacy criteria for machine unlearning covering both unlearned and retained samples, grounded in DP bounds and MIA success rates
- Introduces A-LiRA, an augmentation-based likelihood ratio membership inference attack that achieves online-LiRA quality with 88.3% reduction in compute cost
- Empirically demonstrates that existing approximate unlearning methods inadvertently increase privacy risk of retained samples in differentially private models
🛡️ Threat Analysis
The paper's core adversarial threat model is membership inference: it proposes A-LiRA (a new efficient MIA) and uses MIAs to formally evaluate whether approximate unlearning methods adequately protect both unlearned and retained samples from membership inference.