Membership Inference Attacks Beyond Overfitting

Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant privacy risks to individuals whose sensitive data were used for training, which motivates the use of defenses such as differential privacy, often at the cost of high accuracy losses. MIAs exploit the differences in the behavior of a model when making predictions on samples it has seen during training (members) versus those it has not seen (non-members). Several studies have pointed out that model overfitting is the major factor contributing to these differences in behavior and, consequently, to the success of MIAs. However, the literature also shows that even non-overfitted ML models can leak information about a small subset of their training data. In this paper, we investigate the root causes of membership inference vulnerabilities beyond traditional overfitting concerns and suggest targeted defenses. We empirically analyze the characteristics of the training data samples vulnerable to MIAs in models that are not overfitted (and hence able to generalize). Our findings reveal that these samples are often outliers within their classes (e.g., noisy or hard to classify). We then propose potential defensive strategies to protect these vulnerable samples and enhance the privacy-preserving capabilities of ML models. Our code is available at https://github.com/najeebjebreel/mia_analysis.

Key Contributions

Empirical analysis identifying that outlier samples (far from class centroid) are disproportionately vulnerable to MIAs even in non-overfitted models
Systematic characterization of vulnerable samples using visual analysis, feature-space geometry, and model explanation techniques
Proposed defensive strategies targeting the identified vulnerable outlier samples to enhance privacy without sacrificing model utility

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary contribution is an empirical analysis of membership inference attack (MIA) vulnerability causes beyond overfitting, characterizing which training samples are susceptible and proposing defensive strategies specifically against MIAs.