benchmark 2026

Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework

Xiaoyu Luo , Yiyi Chen , Qiongxiu Li , Johannes Bjerva

1 citations · 39 references · arXiv

α

Published on arXiv

2601.03791

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

When lexical cues are controlled via CRM, PII reconstruction success collapses to near zero across 32 languages, with membership inference yielding near-random accuracy, indicating that prior PII leakage reports substantially overestimate LLM memorization.

Cue-Resistant Memorization (CRM)

Novel technique introduced


Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for LLMs, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target overlap cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in LLMs.


Key Contributions

  • Introduces Cue-Resistant Memorization (CRM), a cue-controlled evaluation framework requiring PII reconstruction to succeed under low lexical-cue conditions as a necessary condition for valid memorization evidence.
  • Demonstrates via multilingual re-evaluation across 32 languages that reconstruction-based PII leakage metrics are dominated by surface-form cues (e.g., naming conventions, email patterns), not genuine memorization.
  • Shows that membership inference and cue-free generation both yield near-random true positive rates, suggesting prior work systematically overestimates LLM privacy risk.

🛡️ Threat Analysis

Membership Inference Attack

Explicitly evaluates 8 membership inference methods across languages as one of the core memorization paradigms, finding near-random true positive rates when cues are controlled.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
multilingual PII datasets (32 languages)mGPT3-13B training corpus
Applications
llm privacy evaluationpii leakage assessmenttraining data extraction