Auditing Approximate Machine Unlearning for Differentially Private Models

Approximate machine unlearning aims to remove the effect of specific data from trained models to ensure individuals' privacy. Existing methods focus on the removed records and assume the retained ones are unaffected. However, recent studies on the \emph{privacy onion effect} indicate this assumption might be incorrect. Especially when the model is differentially private, no study has explored whether the retained ones still meet the differential privacy (DP) criterion under existing machine unlearning methods. This paper takes a holistic approach to auditing both unlearned and retained samples' privacy risks after applying approximate unlearning algorithms. We propose the privacy criteria for unlearned and retained samples, respectively, based on the perspectives of DP and membership inference attacks (MIAs). To make the auditing process more practical, we also develop an efficient MIA, A-LiRA, utilizing data augmentation to reduce the cost of shadow model training. Our experimental findings indicate that existing approximate machine unlearning algorithms may inadvertently compromise the privacy of retained samples for differentially private models, and we need differentially private unlearning algorithms. For reproducibility, we have pubished our code: https://anonymous.4open.science/r/Auditing-machine-unlearning-CB10/README.md

Key Contributions

Reformulates privacy criteria for machine unlearning covering both unlearned and retained samples, grounded in DP bounds and MIA success rates
Introduces A-LiRA, an augmentation-based likelihood ratio membership inference attack that achieves online-LiRA quality with 88.3% reduction in compute cost
Empirically demonstrates that existing approximate unlearning methods inadvertently increase privacy risk of retained samples in differentially private models

🛡️ Threat Analysis

Membership Inference Attack

The paper's core adversarial threat model is membership inference: it proposes A-LiRA (a new efficient MIA) and uses MIAs to formally evaluate whether approximate unlearning methods adequately protect both unlearned and retained samples from membership inference.