TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion

Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing. We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection orthogonal to user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility. Unlike prior defenses either non learnable or agnostic to perturbation direction, TextCrafter provides a directional protective policy that balances privacy and utility. Under strong privacy setting, TextCrafter maintains 70 percentage classification accuracy on four datasets and consistently outperforms Gaussian/LDP baselines across lower privacy budgets, demonstrating a superior privacy utility trade off.

Key Contributions

RL-learned, geometry-aware noise injection orthogonal to user embeddings that suppresses inversion without isotropic utility loss
Cluster priors and trained PII classifier providing directional guidance to steer protective perturbations away from sensitive embedding subspaces
TextCrafter achieves ≥70% downstream classification accuracy at strong privacy (BLEU<3, ROUGE-L<15), outperforming Gaussian and metric-LDP baselines across four datasets

🛡️ Threat Analysis

Model Inversion Attack

TextCrafter directly defends against embedding inversion attacks (Vec2Text) where an adversary reconstructs original user text from intermediate LLM embedding representations. Embedding inversion — recovering text/data from embedding vectors — is explicitly listed under ML03. The adversary threat model, PII reconstruction risk, and the defense mechanism all map squarely to the data reconstruction threat.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Datasets

SST-2AG NewsEnron email

Applications

2025 1 cit.

Model Inversion Attack

71%