attack 2025

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Jiashu Tao 1, Reza Shokri 1,2

0 citations · 28 references · arXiv

α

Published on arXiv

2510.05582

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

InfoRMIA outperforms RMIA across LLM benchmarks while additionally enabling fine-grained token-level localization of memorized training data within model outputs

InfoRMIA

Novel technique introduced


Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.


Key Contributions

  • InfoRMIA: an information-theoretic reformulation of membership inference that consistently outperforms RMIA with improved computational efficiency across LLM benchmarks
  • Token-level membership inference that pinpoints which individual tokens within generated outputs are memorized, localizing privacy leakage from sequence to token granularity
  • New perspective on LLM privacy assessment via token-level signals, enabling more targeted mitigation such as exact unlearning of memorized tokens

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is InfoRMIA, a principled information-theoretic membership inference attack formulation that consistently outperforms the RMIA state-of-the-art; includes token-level MIA that performs stronger sequence-level inference while pinpointing individual memorized tokens.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
large language modelsprivacy risk quantificationmemorization assessment