OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space
Zhiyuan Cao 1,2,3, Zeyu Ma 1,2, Chenhao Yang 1,3, Han Zheng 4, Mingang Chen 1
1 Shanghai Key Laboratory of Computer Software Testing and Evaluating
Published on arXiv
2601.22752
Model Inversion Attack
OWASP ML Top 10 — ML03
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
OSNIP sharply reduces adversarial input reconstruction attack success rates across 12 benchmarks while maintaining strong model utility under strict security constraints.
OSNIP
Novel technique introduced
We propose Obfuscated Semantic Null space Injection for Privacy (OSNIP), a lightweight client-side encryption framework for privacy-preserving LLM inference. Generalizing the geometric intuition of linear kernels to the high-dimensional latent space of LLMs, we formally define the ``Obfuscated Semantic Null Space'', a high-dimensional regime that preserves semantic fidelity while enforcing near-orthogonality to the original embedding. By injecting perturbations that project the original embedding into this space, OSNIP ensures privacy without any post-processing. Furthermore, OSNIP employs a key-dependent stochastic mapping that synthesizes individualized perturbation trajectories unique to each user. Evaluations on 12 generative and classification benchmarks show that OSNIP achieves state-of-the-art performance, sharply reducing attack success rates while maintaining strong model utility under strict security constraints.
Key Contributions
- Formal definition of the 'Obfuscated Semantic Null Space' — a high-dimensional embedding regime that is geometrically near-orthogonal to the original input yet semantically equivalent to the LLM, enabling perturbation-based obfuscation without post-processing
- Key-dependent stochastic perturbation mapping that synthesizes individualized obfuscation trajectories per user, preventing correlation attacks across users
- Lightweight client-side framework evaluated on 12 generative and classification benchmarks, achieving state-of-the-art attack success rate reduction while maintaining model utility
🛡️ Threat Analysis
The primary threat model is an adversary (cloud server or eavesdropper) performing embedding inversion — reconstructing the user's original private text from the obfuscated embeddings sent during LLM inference. OSNIP defends against this by injecting perturbations into a near-orthogonal 'Semantic Null Space', and the paper evaluates attack success rates against this reconstruction threat.