attack arXiv Feb 11, 2026 · 7w ago
Han Xiao · Jina AI by Elastic
Parallel masked diffusion attack recovers 81% of tokens from text embeddings without accessing the target encoder
Model Inversion Attack nlp
We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves up to 81.3% token accuracy. Source code and live demo are available at https://github.com/jina-ai/embedding-inversion-demo.
diffusion transformer Jina AI by Elastic