benchmark 2026

Membership Inference Attacks from Causal Principles

Mathieu Even 1, Clément Berenfeld 1, Linus Bleistein 2, Tudor Cebere 1, Julie Josse 1, Aurélien Bellet 1

0 citations · 72 references · arXiv (Cornell University)

α

Published on arXiv

2602.02819

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Causal MIA estimators enable reliable memorization measurement under distribution shift and without retraining, exposing systematic overestimation of memorization in standard zero-run LLM evaluations.


Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationally costly for large models. One-run methods (single training with randomized data inclusion) and zero-run methods (post hoc evaluation) are often used instead, though their statistical validity remains unclear. To address this gap, we frame MIA evaluation as a causal inference problem, defining memorization as the causal effect of including a data point in the training set. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations popular for LLMs are confounded by non-random membership assignment. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. Experiments on real-world data show that our approach enables reliable memorization measurement even when retraining is impractical and under distribution shift, providing a principled foundation for privacy evaluation in modern AI systems.


Key Contributions

  • Causal formalization of MIAs using the potential outcomes framework, defining memorization as the causal effect of training data inclusion
  • Taxonomy of bias sources in MIA evaluation: interference in one-run settings and confounding from non-random membership assignment in zero-run (LLM) evaluations
  • Practical causal estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees that correct distribution shift bias

🛡️ Threat Analysis

Membership Inference Attack

The paper is entirely about Membership Inference Attacks — specifically, it proposes causal estimators and evaluation methodology to measure MIA performance accurately, correcting biases (interference in one-run, confounding in zero-run) that distort memorization assessments.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timetraining_time
Applications
language model privacy auditingtraining data memorization measurement