attack 2025

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Haiyan Zhao 1, Zirui He 1, Fan Yang 2, Ali Payani 1, Mengnan Du 3

0 citations · 21 references · arXiv

α

Published on arXiv

2511.06571

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Rep2Text recovers over 50% of token-level information from 16-token input sequences using a single LLM last-token representation, demonstrating substantial privacy leakage from compressed internal representations.

Rep2Text

Novel technique introduced


Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.


Key Contributions

  • Rep2Text framework: a trainable adapter that projects last-token LLM representations into a decoding LLM's embedding space for autoregressive input reconstruction
  • Empirical demonstration that >50% of token-level information in 16-token sequences is recoverable from a single last-token representation across multiple model families (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B)
  • Analysis of an information bottleneck effect: longer sequences show reduced token-level recovery while preserving semantic coherence, and the framework generalizes to out-of-distribution medical text

🛡️ Threat Analysis

Model Inversion Attack

Rep2Text is an embedding inversion framework: it trains an adapter to project LLM internal representations back into a decoding model's embedding space and autoregressively reconstructs the original input text. This matches ML03's 'embedding inversion (recovering text/data from embedding vectors)' criterion. The adversary test passes: the method demonstrates that an adversary with access to LLM internal activations can reconstruct private user inputs.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_time
Datasets
medical (out-of-distribution generalization dataset, unspecified)
Applications
text privacyllm interpretabilitymedical text confidentiality