The Sample Complexity of Membership Inference and Privacy Auditing
Mahdi Haghifam 1, Adam Smith 2, Jonathan Ullman 1
Published on arXiv
2508.19458
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
For Gaussian mean estimation in d dimensions with n training samples and accuracy ρ, an attacker without covariance knowledge needs Ω(n + n²ρ²) reference samples, meaning existing O(n)-sample attacks may significantly underestimate the true MIA risk.
A membership-inference attack gets the output of a learning algorithm, and a target individual, and tries to determine whether this individual is a member of the training data or an independent sample from the same distribution. A successful membership-inference attack typically requires the attacker to have some knowledge about the distribution that the training data was sampled from, and this knowledge is often captured through a set of independent reference samples from that distribution. In this work we study how much information the attacker needs for membership inference by investigating the sample complexity-the minimum number of reference samples required-for a successful attack. We study this question in the fundamental setting of Gaussian mean estimation where the learning algorithm is given $n$ samples from a Gaussian distribution $\mathcal{N}(μ,Σ)$ in $d$ dimensions, and tries to estimate $\hatμ$ up to some error $\mathbb{E}[\|\hat μ- μ\|^2_Σ]\leq ρ^2 d$. Our result shows that for membership inference in this setting, $Ω(n + n^2 ρ^2)$ samples can be necessary to carry out any attack that competes with a fully informed attacker. Our result is the first to show that the attacker sometimes needs many more samples than the training algorithm uses to train the model. This result has significant implications for practice, as all attacks used in practice have a restricted form that uses $O(n)$ samples and cannot benefit from $ω(n)$ samples. Thus, these attacks may be underestimating the possibility of membership inference, and better attacks may be possible when information about the distribution is easy to obtain.
Key Contributions
- First theoretical lower bound showing attackers sometimes need Ω(n + n²ρ²) reference samples — far more than the training set size n — to carry out optimal membership inference in Gaussian mean estimation
- Demonstrates that all practical MIA methods, which use only O(n) reference samples, are structurally limited and cannot achieve fully informed attacker performance in high-dimensional regimes
- Connects the sample complexity of MIA to the difficulty of estimating y⊤Σ⁻¹y without knowledge of Σ, providing new theoretical grounding for privacy auditing
🛡️ Threat Analysis
The paper's entire contribution is a theoretical analysis of membership inference attacks — specifically, proving sample complexity lower bounds on how many reference samples an attacker needs to successfully distinguish training members from non-members, and demonstrating that existing practical MIA methods may systematically underestimate the true attack possibility.