Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
Xiaoyu Luo , Yiyi Chen , Qiongxiu Li , Johannes Bjerva
Published on arXiv
2601.03791
Membership Inference Attack
OWASP ML Top 10 — ML04
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
When lexical cues are controlled via CRM, PII reconstruction success collapses to near zero across 32 languages, with membership inference yielding near-random accuracy, indicating that prior PII leakage reports substantially overestimate LLM memorization.
Cue-Resistant Memorization (CRM)
Novel technique introduced
Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for LLMs, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target overlap cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in LLMs.
Key Contributions
- Introduces Cue-Resistant Memorization (CRM), a cue-controlled evaluation framework requiring PII reconstruction to succeed under low lexical-cue conditions as a necessary condition for valid memorization evidence.
- Demonstrates via multilingual re-evaluation across 32 languages that reconstruction-based PII leakage metrics are dominated by surface-form cues (e.g., naming conventions, email patterns), not genuine memorization.
- Shows that membership inference and cue-free generation both yield near-random true positive rates, suggesting prior work systematically overestimates LLM privacy risk.
🛡️ Threat Analysis
Explicitly evaluates 8 membership inference methods across languages as one of the core memorization paradigms, finding near-random true positive rates when cues are controlled.