defense 2026

OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space

Zhiyuan Cao 1,2,3, Zeyu Ma 1,2, Chenhao Yang 1,3, Han Zheng 4, Mingang Chen 1

0 citations · 43 references · arXiv

α

Published on arXiv

2601.22752

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

OSNIP sharply reduces adversarial input reconstruction attack success rates across 12 benchmarks while maintaining strong model utility under strict security constraints.

OSNIP

Novel technique introduced


We propose Obfuscated Semantic Null space Injection for Privacy (OSNIP), a lightweight client-side encryption framework for privacy-preserving LLM inference. Generalizing the geometric intuition of linear kernels to the high-dimensional latent space of LLMs, we formally define the ``Obfuscated Semantic Null Space'', a high-dimensional regime that preserves semantic fidelity while enforcing near-orthogonality to the original embedding. By injecting perturbations that project the original embedding into this space, OSNIP ensures privacy without any post-processing. Furthermore, OSNIP employs a key-dependent stochastic mapping that synthesizes individualized perturbation trajectories unique to each user. Evaluations on 12 generative and classification benchmarks show that OSNIP achieves state-of-the-art performance, sharply reducing attack success rates while maintaining strong model utility under strict security constraints.


Key Contributions

  • Formal definition of the 'Obfuscated Semantic Null Space' — a high-dimensional embedding regime that is geometrically near-orthogonal to the original input yet semantically equivalent to the LLM, enabling perturbation-based obfuscation without post-processing
  • Key-dependent stochastic perturbation mapping that synthesizes individualized obfuscation trajectories per user, preventing correlation attacks across users
  • Lightweight client-side framework evaluated on 12 generative and classification benchmarks, achieving state-of-the-art attack success rate reduction while maintaining model utility

🛡️ Threat Analysis

Model Inversion Attack

The primary threat model is an adversary (cloud server or eavesdropper) performing embedding inversion — reconstructing the user's original private text from the obfuscated embeddings sent during LLM inference. OSNIP defends against this by injecting perturbations into a near-orthogonal 'Semantic Null Space', and the paper evaluates attack success rates against this reconstruction threat.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
12 generative and classification benchmarks (unnamed in abstract/partial body)
Applications
llm-as-a-service inferencecloud llm api privacyprivacy-preserving nlp