Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model's top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.

Key Contributions

Introduces Gap-K%, a pretraining data detection method leveraging the log probability gap between the model's top-1 predicted token and the actual target token, motivated by gradient dynamics of the next-token prediction objective
Incorporates a sliding window strategy to capture local token correlations and smooth token-level score fluctuations
Achieves state-of-the-art membership inference performance on WikiMIA and MIMIR benchmarks across multiple model sizes and input lengths

🛡️ Threat Analysis

Membership Inference Attack

Gap-K% is a membership inference attack: it answers the binary question 'was this specific text in the LLM's training set?' by exploiting top-1 token prediction gaps and local token correlations. Evaluated on WikiMIA and MIMIR, canonical MIA benchmarks.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

grey_boxinference_timetargeted

Datasets

WikiMIAMIMIR

Applications

2025 0 cit.

Membership Inference Attack

75%

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Powerful Training-Free Membership Inference Against Autoregressive Language Models

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach

AST-PAC: AST-guided Membership Inference for Code

As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage