defense 2025

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

0 citations

Published on arXiv

2508.08438

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

SafeKV reduces time-to-first-token overhead vs. full isolation by up to 40.58% and raises throughput by up to 2.66x while enforcing cross-tenant privacy in multi-tenant LLM serving.

SafeKV

Novel technique introduced

Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive user inputs from shared entries, leading to cross-tenant privacy risks. To address this problem, we introduce SafeKV (Secure and Flexible KV-cache Sharing), a system-level co-design of privacy enforcement and KV-cache management. SafeKV integrates lightweight detection and isolation directly into the serving runtime to eliminate cross-tenant reuse of sensitive KV-cache blocks under our threat model, while recovering most of the performance benefits of global sharing. Our key contributions are: (1) a three-tier asynchronous detection pipeline that decouples privacy classification from inference and supports streaming workloads, (2) a unified radix-tree-based memory manager with path compression and sensitivity-aware eviction for scalable selective isolation, and (3) an RDR-guided (Reuse Diversity Ratio) runtime safeguard that detects and bounds residual leakage. On large LLM backends, SafeKV reduces the time-to-first-token (TTFT) overhead compared to full isolation by up to 40.58% and raises throughput by up to 2.66x. Overall, SafeKV restores the efficiency of KV reuse while enforcing strong, practical privacy for multi-tenant LLM inference.

Key Contributions

A three-tier asynchronous detection pipeline that decouples privacy classification from inference and supports streaming workloads
A unified radix-tree-based memory manager with path compression and sensitivity-aware eviction for scalable selective KV-cache isolation
An RDR-guided (Reuse Diversity Ratio) runtime safeguard that detects and bounds residual leakage, reducing TTFT overhead by up to 40.58% vs. full isolation while raising throughput by up to 2.66x

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

multi-tenant llm inference servingllm api services

Read PDF arXiv Code

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots