attack 2025

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Haiyan Zhao ¹, Zirui He ¹, Fan Yang ², Ali Payani ¹, Mengnan Du ³

¹ New Jersey Institute of Technology

² Wake Forest University

³ Cisco Research

0 citations · 21 references · arXiv

Published on arXiv

2511.06571

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Rep2Text recovers over 50% of token-level information from 16-token input sequences using a single LLM last-token representation, demonstrating substantial privacy leakage from compressed internal representations.

Rep2Text

Novel technique introduced

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.

Key Contributions

Rep2Text framework: a trainable adapter that projects last-token LLM representations into a decoding LLM's embedding space for autoregressive input reconstruction
Empirical demonstration that >50% of token-level information in 16-token sequences is recoverable from a single last-token representation across multiple model families (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B)
Analysis of an information bottleneck effect: longer sequences show reduced token-level recovery while preserving semantic coherence, and the framework generalizes to out-of-distribution medical text

🛡️ Threat Analysis

Model Inversion Attack

Rep2Text is an embedding inversion framework: it trains an adapter to project LLM internal representations back into a decoding model's embedding space and autoregressively reconstructs the original input text. This matches ML03's 'embedding inversion (recovering text/data from embedding vectors)' criterion. The adversary test passes: the method demonstrates that an adversary with access to LLM internal activations can reconstruct private user inputs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxinference_time

Datasets

medical (out-of-distribution generalization dataset, unspecified)

Applications

text privacyllm interpretabilitymedical text confidentiality

Read PDF arXiv DOI

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Language Models are Injective and Hence Invertible

Reverse-Engineering Model Editing on Language Models

Discovering Universal Activation Directions for PII Leakage in Language Models

Retracing the Past: LLMs Emit Training Data When They Get Lost

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory

ISACL: Internal State Analyzer for Copyrighted Training Data Leakage

Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings