defense 2025

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Honglan Yu ^1,2,3, Yibin Wang ^1,2,3, Feifei Dai ¹, Dong Liu ¹, Haihui Fan ¹, Xiaoyan Gu ^1,2,3

¹ Chinese Academy of Sciences

² University of Chinese Academy of Sciences

³ State Key Laboratory of Cyberspace Security Defense

0 citations

Published on arXiv

2509.09091

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

CMIF reduces TEE-induced inference overhead from 6.5× to 1.54× while preserving user input privacy against a curious inference server.

CMIF (Confidential and efficient Model Inference Framework)

Novel technique introduced

CPU-based trusted execution environments (TEEs) and differential privacy (DP) have gained wide applications for private inference. Due to high inference latency in TEEs, researchers use partition-based approaches that offload linear model components to GPUs. However, dense nonlinear layers of large language models (LLMs) result in significant communication overhead between TEEs and GPUs. DP-based approaches apply random noise to protect data privacy, but this compromises LLM performance and semantic understanding. To overcome the above drawbacks, this paper proposes CMIF, a Confidential and efficient Model Inference Framework. CMIF confidentially deploys the embedding layer in the client-side TEE and subsequent layers on GPU servers. Meanwhile, it optimizes the Report-Noisy-Max mechanism to protect sensitive inputs with a slight decrease in model performance. Extensive experiments on Llama-series models demonstrate that CMIF reduces additional inference overhead in TEEs while preserving user data privacy.

Key Contributions

CMIF framework that deploys the embedding layer + DP sanitization inside a client-side TEE while offloading remaining LLM computation to GPU servers, reducing TEE inference overhead from 6.5× to 1.54× over baseline.
Optimized Report-Noisy-Max (RNM) text sanitization mechanism that replaces sensitive tokens with semantically coherent alternatives under differential privacy, outperforming prior DP-based methods at comparable privacy budgets.
Dual-layer privacy protection combining TEE hardware isolation (for model/sanitizer confidentiality) with DP input obfuscation (for user query privacy), evaluated on Llama-series models.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

standard NLP benchmarks (unspecified beyond Llama-series evaluation)

Applications

llm inferencecloud-hosted language modelsmedical query privacy

Read PDF arXiv Code

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots