benchmark 2025

Auditing Approximate Machine Unlearning for Differentially Private Models

Yuechun Gu , Jiajie He , Keke Chen

0 citations

α

Published on arXiv

2508.18671

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Existing approximate machine unlearning algorithms fail to maintain the DP privacy budget for retained samples, violating the privacy onion effect assumptions that underlie them.

A-LiRA

Novel technique introduced


Approximate machine unlearning aims to remove the effect of specific data from trained models to ensure individuals' privacy. Existing methods focus on the removed records and assume the retained ones are unaffected. However, recent studies on the \emph{privacy onion effect} indicate this assumption might be incorrect. Especially when the model is differentially private, no study has explored whether the retained ones still meet the differential privacy (DP) criterion under existing machine unlearning methods. This paper takes a holistic approach to auditing both unlearned and retained samples' privacy risks after applying approximate unlearning algorithms. We propose the privacy criteria for unlearned and retained samples, respectively, based on the perspectives of DP and membership inference attacks (MIAs). To make the auditing process more practical, we also develop an efficient MIA, A-LiRA, utilizing data augmentation to reduce the cost of shadow model training. Our experimental findings indicate that existing approximate machine unlearning algorithms may inadvertently compromise the privacy of retained samples for differentially private models, and we need differentially private unlearning algorithms. For reproducibility, we have pubished our code: https://anonymous.4open.science/r/Auditing-machine-unlearning-CB10/README.md


Key Contributions

  • Reformulates privacy criteria for machine unlearning covering both unlearned and retained samples, grounded in DP bounds and MIA success rates
  • Introduces A-LiRA, an augmentation-based likelihood ratio membership inference attack that achieves online-LiRA quality with 88.3% reduction in compute cost
  • Empirically demonstrates that existing approximate unlearning methods inadvertently increase privacy risk of retained samples in differentially private models

🛡️ Threat Analysis

Membership Inference Attack

The paper's core adversarial threat model is membership inference: it proposes A-LiRA (a new efficient MIA) and uses MIAs to formally evaluate whether approximate unlearning methods adequately protect both unlearned and retained samples from membership inference.


Details

Domains
vision
Model Types
cnntraditional_ml
Threat Tags
black_boxinference_timetraining_time
Datasets
CIFAR-10CIFAR-100
Applications
machine unlearningdifferentially private ml