TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Duoxun Tang 1, Xinhang Jiang 2, Jiajun Niu 1
Published on arXiv
2509.17302
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
TextCrafter maintains ≥70% classification accuracy while keeping BLEU<3 and ROUGE-L<15 against Vec2Text inversion, consistently outperforming Gaussian noise and metric-LDP baselines across lower privacy budgets.
TextCrafter
Novel technique introduced
Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing. We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection orthogonal to user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility. Unlike prior defenses either non learnable or agnostic to perturbation direction, TextCrafter provides a directional protective policy that balances privacy and utility. Under strong privacy setting, TextCrafter maintains 70 percentage classification accuracy on four datasets and consistently outperforms Gaussian/LDP baselines across lower privacy budgets, demonstrating a superior privacy utility trade off.
Key Contributions
- RL-learned, geometry-aware noise injection orthogonal to user embeddings that suppresses inversion without isotropic utility loss
- Cluster priors and trained PII classifier providing directional guidance to steer protective perturbations away from sensitive embedding subspaces
- TextCrafter achieves ≥70% downstream classification accuracy at strong privacy (BLEU<3, ROUGE-L<15), outperforming Gaussian and metric-LDP baselines across four datasets
🛡️ Threat Analysis
TextCrafter directly defends against embedding inversion attacks (Vec2Text) where an adversary reconstructs original user text from intermediate LLM embedding representations. Embedding inversion — recovering text/data from embedding vectors — is explicitly listed under ML03. The adversary threat model, PII reconstruction risk, and the defense mechanism all map squarely to the data reconstruction threat.