PAC-Private Responses with Adversarial Composition
Xiaochen Zhu , Mayuri Sridhar , Srinivas Devadas
Published on arXiv
2601.14033
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Achieves 87.79% accuracy on CIFAR-10 with per-step MI budget of 2^{-32}, bounding MIA success to 51.08% over one million queries — equivalent to a (0.04, 10^{-5})-DP guarantee.
PAC-private adversarial composition
Novel technique introduced
Modern machine learning models are increasingly deployed behind APIs. This renders standard weight-privatization methods (e.g. DP-SGD) unnecessarily noisy at the cost of utility. While model weights may vary significantly across training datasets, model responses to specific inputs are much lower dimensional and more stable. This motivates enforcing privacy guarantees directly on model outputs. We approach this under PAC privacy, which provides instance-based privacy guarantees for arbitrary black-box functions by controlling mutual information (MI). Importantly, PAC privacy explicitly rewards output stability with reduced noise levels. However, a central challenge remains: response privacy requires composing a large number of adaptively chosen, potentially adversarial queries issued by untrusted users, where existing composition results on PAC privacy are inadequate. We introduce a new algorithm that achieves adversarial composition via adaptive noise calibration and prove that mutual information guarantees accumulate linearly under adaptive and adversarial querying. Experiments across tabular, vision, and NLP tasks show that our method achieves high utility at extremely small per-query privacy budgets. On CIFAR-10, we achieve 87.79% accuracy with a per-step MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership inference attack (MIA) success rates to 51.08% -- the same guarantee of $(0.04, 10^{-5})$-DP. Furthermore, we show that private responses can be used to label public data to distill a publishable privacy-preserving model; using an ImageNet subset as a public dataset, our model distilled from 210,000 responses achieves 91.86% accuracy on CIFAR-10 with MIA success upper-bounded by 50.49%, which is comparable to $(0.02,10^{-5})$-DP.
Key Contributions
- New algorithm for adversarial composition of PAC privacy guarantees via adaptive noise calibration, proving MI accumulates linearly under adaptive/adversarial querying
- Response-level privacy framework that adds noise to model outputs (not weights), achieving high utility at extremely small per-query MI budgets (2^{-32} on CIFAR-10)
- Distillation pipeline that uses privately labeled public data to produce a publishable model with provably bounded MIA success (~50.49%, comparable to (0.02, 10^{-5})-DP)
🛡️ Threat Analysis
The primary security contribution is bounding membership inference attack (MIA) success rates — the paper explicitly uses MIA success rate as its main security metric, proves composition bounds under adversarial querying, and achieves MIA success bounded to ~51% (near random) across vision, NLP, and tabular tasks.