attack 2025

ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios

Yuval Golbari , Navve Wasserman , Gal Vardi , Michal Irani

0 citations · 29 references · arXiv

α

Published on arXiv

2510.10625

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

ImpMIA achieves state-of-the-art membership inference performance in realistic settings where only model weights are available and the training set fraction is unknown, outperforming both black-box and white-box baselines without requiring reference model training.

ImpMIA

Novel technique introduced


Determining which data samples were used to train a model-known as Membership Inference Attack (MIA)-is a well-studied and important problem with implications for data privacy. Black-box methods presume access only to the model's outputs and often rely on training auxiliary reference models. While they have shown strong empirical performance, they rely on assumptions that rarely hold in real-world settings: (i) the attacker knows the training hyperparameters; (ii) all available non-training samples come from the same distribution as the training data; and (iii) the fraction of training data in the evaluation set is known. In this paper, we demonstrate that removing these assumptions leads to a significant drop in the performance of black-box attacks. We introduce ImpMIA, a Membership Inference Attack that exploits the Implicit Bias of neural networks, hence removes the need to rely on any reference models and their assumptions. ImpMIA is a white-box attack -- a setting which assumes access to model weights and is becoming increasingly realistic given that many models are publicly available (e.g., via Hugging Face). Building on maximum-margin implicit bias theory, ImpMIA uses the Karush-Kuhn-Tucker (KKT) optimality conditions to identify training samples. This is done by finding the samples whose gradients most strongly reconstruct the trained model's parameters. As a result, ImpMIA achieves state-of-the-art performance compared to both black and white box attacks in realistic settings where only the model weights and a superset of the training data are available.


Key Contributions

  • ImpMIA: a white-box MIA that exploits maximum-margin implicit bias theory via KKT optimality conditions to identify training samples whose gradients most strongly reconstruct model parameters
  • Demonstrates significant performance degradation of SOTA black-box MIA methods when their unrealistic assumptions (known hyperparameters, in-distribution non-members, known member fraction) are removed
  • Eliminates the need for reference model training, requiring only model weights and a superset of the training data

🛡️ Threat Analysis

Membership Inference Attack

ImpMIA is a membership inference attack — it determines whether specific data points were in a model's training set. The paper's entire contribution is a novel MIA technique that achieves SOTA performance in realistic settings, directly targeting the binary 'was this sample in training?' problem.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_time
Applications
image classification