tool 2025

Provable Training Data Identification for Large Language Models

Zhenlong Liu 1,2, Hao Zeng 1, Weiran Huang 2,3, Hongxin Wei 1

0 citations · 59 references · arXiv

α

Published on arXiv

2510.09717

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

PTDI achieves strictly controlled false identification rate while attaining higher statistical power than prior membership inference methods across diverse LLM and VLM settings.

PTDI (Provable Training Data Identification)

Novel technique introduced


Identifying training data of large-scale models is critical for copyright litigation, privacy auditing, and ensuring fair evaluation. However, existing works typically treat this task as an instance-wise identification without controlling the error rate of the identified set, which cannot provide statistically reliable evidence. In this work, we formalize training data identification as a set-level inference problem and propose Provable Training Data Identification (PTDI), a distribution-free approach that enables provable and strict false identification rate control. Specifically, our method computes conformal p-values for each data point using a set of known unseen data and then develops a novel Jackknife-corrected Beta boundary (JKBB) estimator to estimate the training-data proportion of the test set, which allows us to scale these p-values. By applying the Benjamini-Hochberg (BH) procedure to the scaled p-values, we select a subset of data points with provable and strict false identification control. Extensive experiments across various models and datasets demonstrate that PTDI achieves higher power than prior methods while strictly controlling the FIR.


Key Contributions

  • Formalizes training data identification as a set-level inference problem with provable false identification rate (FIR) control, unlike prior instance-wise approaches
  • Proposes the Jackknife-corrected Beta boundary (JKBB) estimator to estimate training-data proportion in the test set, enabling data-dependent p-value scaling
  • Applies the Benjamini-Hochberg procedure to conformal p-values for statistically rigorous, high-power training data identification compatible with both black-box and white-box detection scores

🛡️ Threat Analysis

Membership Inference Attack

The paper's core contribution is determining whether specific data points were in an LLM's training set — the definition of membership inference. It formalizes and improves MIA methodology by shifting from instance-wise binary classification to set-level inference with statistical guarantees (FIR control), directly applicable to privacy auditing and copyright litigation.


Details

Domains
nlp
Model Types
llmtransformervlm
Threat Tags
black_boxwhite_boxinference_time
Applications
llm training data auditingcopyright litigationprivacy auditingbenchmark contamination detection