Embedding Inversion via Conditional Masked Diffusion Language Models

We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves up to 81.3% token accuracy. Source code and live demo are available at https://github.com/jina-ai/embedding-inversion-demo.

Key Contributions

Frames embedding inversion as conditional masked diffusion, enabling all-position parallel token recovery instead of sequential autoregressive generation
Injects target embedding into each transformer layer via adaptive layer normalization (AdaLN), making the attack encoder-agnostic with no access to the target encoder at inference time
Achieves 81.3% token accuracy on 32-token sequences using only 8 forward passes through a 78M parameter model, without iterative re-embedding or architecture-specific alignment

🛡️ Threat Analysis

Model Inversion Attack

Embedding inversion is explicitly listed under ML03 — the adversary holds an embedding vector and reconstructs the original input text. The paper directly attacks the privacy assumption that text embeddings are 'safe, anonymized representations,' achieving 81.3% token recovery using a diffusion model conditioned on the target embedding.