defense 2025

Membership Inference Attacks Beyond Overfitting

Mona Khalil 1, Alberto Blanco-Justicia 1, Najeeb Jebreel 1, Josep Domingo-Ferrer 1,2

0 citations · 48 references · arXiv

α

Published on arXiv

2511.16792

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Outlier training samples (noisy or hard-to-classify examples far from their class centroid) remain vulnerable to membership inference even in well-generalizing, non-overfitted models.


Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant privacy risks to individuals whose sensitive data were used for training, which motivates the use of defenses such as differential privacy, often at the cost of high accuracy losses. MIAs exploit the differences in the behavior of a model when making predictions on samples it has seen during training (members) versus those it has not seen (non-members). Several studies have pointed out that model overfitting is the major factor contributing to these differences in behavior and, consequently, to the success of MIAs. However, the literature also shows that even non-overfitted ML models can leak information about a small subset of their training data. In this paper, we investigate the root causes of membership inference vulnerabilities beyond traditional overfitting concerns and suggest targeted defenses. We empirically analyze the characteristics of the training data samples vulnerable to MIAs in models that are not overfitted (and hence able to generalize). Our findings reveal that these samples are often outliers within their classes (e.g., noisy or hard to classify). We then propose potential defensive strategies to protect these vulnerable samples and enhance the privacy-preserving capabilities of ML models. Our code is available at https://github.com/najeebjebreel/mia_analysis.


Key Contributions

  • Empirical analysis identifying that outlier samples (far from class centroid) are disproportionately vulnerable to MIAs even in non-overfitted models
  • Systematic characterization of vulnerable samples using visual analysis, feature-space geometry, and model explanation techniques
  • Proposed defensive strategies targeting the identified vulnerable outlier samples to enhance privacy without sacrificing model utility

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary contribution is an empirical analysis of membership inference attack (MIA) vulnerability causes beyond overfitting, characterizing which training samples are susceptible and proposing defensive strategies specifically against MIAs.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_time
Datasets
CIFAR-10CIFAR-100
Applications
image classificationclinical record privacysensitive data protection