Membership Inference Attacks from Causal Principles

Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationally costly for large models. One-run methods (single training with randomized data inclusion) and zero-run methods (post hoc evaluation) are often used instead, though their statistical validity remains unclear. To address this gap, we frame MIA evaluation as a causal inference problem, defining memorization as the causal effect of including a data point in the training set. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations popular for LLMs are confounded by non-random membership assignment. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. Experiments on real-world data show that our approach enables reliable memorization measurement even when retraining is impractical and under distribution shift, providing a principled foundation for privacy evaluation in modern AI systems.

Key Contributions

Causal formalization of MIAs using the potential outcomes framework, defining memorization as the causal effect of training data inclusion
Taxonomy of bias sources in MIA evaluation: interference in one-run settings and confounding from non-random membership assignment in zero-run (LLM) evaluations
Practical causal estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees that correct distribution shift bias

🛡️ Threat Analysis

Membership Inference Attack

The paper is entirely about Membership Inference Attacks — specifically, it proposes causal estimators and evaluation methodology to measure MIA performance accurately, correcting biases (interference in one-run, confounding in zero-run) that distort memorization assessments.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timetraining_time

Applications

2026 0 cit.

Membership Inference Attack

73%

Membership Inference Attacks from Causal Principles

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs

The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

SynBench: A Benchmark for Differentially Private Text Generation

On the Evidentiary Limits of Membership Inference for Copyright Auditing

LoRA and Privacy: When Random Projections Help (and When They Don't)

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora