Powerful Training-Free Membership Inference Against Autoregressive Language Models

Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any kind. On WikiText with GPT-2, EZ-MIA achieves 3.8x higher detection than the previous state-of-the-art under identical conditions (66.3% versus 17.5% true positive rate at 1% false positive rate), with near-perfect discrimination (AUC 0.98). At the stringent 0.1% FPR threshold critical for real-world auditing, we achieve 8x higher detection than prior work (14.0% versus 1.8%), requiring no reference model training. These gains extend to larger architectures: on AG News with Llama-2-7B, we achieve 3x higher detection (46.7% versus 15.8% TPR at 1% FPR). These results establish that privacy risks of fine-tuned language models are substantially greater than previously understood, with implications for both privacy auditing and deployment decisions. Code is available at https://github.com/JetBrains-Research/ez-mia.

Key Contributions

Introduces the Error Zone (EZ) score, which measures directional probability imbalance at error token positions relative to a pretrained reference model as a memorization signal
Demonstrates that EZ-MIA achieves 3.8× higher TPR at 1% FPR and 8× higher TPR at 0.1% FPR than prior SOTA on WikiText/GPT-2, requiring only two forward passes and no model training
Shows that fine-tuning methodology (full fine-tuning vs. LoRA) is a primary determinant of privacy risk, with a 55× difference in detection rate on the same model and dataset

🛡️ Threat Analysis

Membership Inference Attack

EZ-MIA is a direct membership inference attack that determines whether specific text samples were in the training set of fine-tuned language models, achieving 3.8× higher TPR at 1% FPR than prior SOTA — this is the canonical ML04 threat.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

grey_boxinference_time

Datasets

WikiTextAG NewsXSum

Applications

2025 0 cit.

Membership Inference Attack

82%

Powerful Training-Free Membership Inference Against Autoregressive Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

AST-PAC: AST-guided Membership Inference for Code

Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Membership Inference on LLMs in the Wild

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

In-Context Probing for Membership Inference in Fine-Tuned Language Models