benchmark 2025

The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models

Euodia Dodd , Nataša Krčo , Igor Shilov , Yves-Alexandre de Montjoye

0 citations · 33 references · arXiv

α

Published on arXiv

2510.19773

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

TNR of a simple loss attack accurately estimates model-level vulnerability to LiRA across diverse architectures and datasets without requiring any reference models

TNR loss-tail vulnerability estimator

Novel technique introduced


Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art attacks require training numerous, often computationally expensive, reference models, limiting their practicality. We present a novel approach for estimating model-level vulnerability, the TPR at low FPR, to membership inference attacks without requiring reference models. Empirical analysis shows loss distributions to be asymmetric and heavy-tailed and suggests that most points at risk from MIAs have moved from the tail (high-loss region) to the head (low-loss region) of the distribution after training. We leverage this insight to propose a method to estimate model-level vulnerability from the training and testing distribution alone: using the absence of outliers from the high-loss region as a predictor of the risk. We evaluate our method, the TNR of a simple loss attack, across a wide range of architectures and datasets and show it to accurately estimate model-level vulnerability to the SOTA MIA attack (LiRA). We also show our method to outperform both low-cost (few reference models) attacks such as RMIA and other measures of distribution difference. We finally evaluate the use of non-linear functions to evaluate risk and show the approach to be promising to evaluate the risk in large-language models.


Key Contributions

  • Empirical analysis showing training loss distributions are asymmetric and heavy-tailed, with vulnerable points shifting from the tail to the head after training
  • Proposes TNR of a simple loss attack as a reference-model-free estimator of model-level MIA vulnerability, accurately proxying SOTA LiRA performance
  • Outperforms low-cost attacks (RMIA with few reference models) and distribution-difference measures, with promising extension to LLMs

🛡️ Threat Analysis

Membership Inference Attack

Paper's entire focus is membership inference attacks — specifically, proposing a reference-model-free method to estimate model-level vulnerability (TPR at low FPR) to state-of-the-art MIA attacks like LiRA and RMIA using the TNR of a simple loss attack as a proxy metric.


Details

Domains
visionnlp
Model Types
cnntransformerllm
Threat Tags
black_boxinference_time
Datasets
CIFAR-10CIFAR-100
Applications
image classificationlanguage modelingprivacy auditing