attack 2026

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Minseo Kwak , Jaehyung Kim

0 citations · 16 references · arXiv

α

Published on arXiv

2601.19936

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Gap-K% consistently outperforms prior token-likelihood-based membership inference baselines across various LLM sizes and input lengths on WikiMIA and MIMIR benchmarks.

Gap-K%

Novel technique introduced


The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model's top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.


Key Contributions

  • Introduces Gap-K%, a pretraining data detection method leveraging the log probability gap between the model's top-1 predicted token and the actual target token, motivated by gradient dynamics of the next-token prediction objective
  • Incorporates a sliding window strategy to capture local token correlations and smooth token-level score fluctuations
  • Achieves state-of-the-art membership inference performance on WikiMIA and MIMIR benchmarks across multiple model sizes and input lengths

🛡️ Threat Analysis

Membership Inference Attack

Gap-K% is a membership inference attack: it answers the binary question 'was this specific text in the LLM's training set?' by exploiting top-1 token prediction gaps and local token correlations. Evaluated on WikiMIA and MIMIR, canonical MIA benchmarks.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
grey_boxinference_timetargeted
Datasets
WikiMIAMIMIR
Applications
llm pretraining data detectioncopyright infringement detectionprivacy auditing