defense 2026

CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

Panagiotis Georgios Pennas 1,2, Konstantinos Papaioannou 1,2, Marco Guarnieri 1, Thaleia Dimitra Doudali 1

0 citations

α

Published on arXiv

2603.10726

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

CacheSolidarity enables up to 70% higher cache reuse and 30% lower inference latency versus full user-isolation defenses while preventing prompt reconstruction via APC timing side channels.

CacheSolidarity

Novel technique introduced


Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents CacheSolidarity, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency. CacheSolidarity monitors cache reuse across users, flags suspicious sharing, and selectively isolates prefixes, restricting their reuse only when necessary. Evaluation shows that CacheSolidarity enables up to 70% higher cache reuse and 30% lower inference latency compared to existing defenses that isolate users. CacheSolidarity's lightweight design demonstrates how security in LLM serving does not have to come at the cost of unnecessarily reduced performance or unbearable overheads.


Key Contributions

  • Characterizes the APC timing side-channel threat in multi-tenant LLM serving, showing attackers can incrementally reconstruct other users' prompts via cache hit/miss latency differences
  • Proposes CacheSolidarity, which monitors cross-user cache reuse patterns, flags suspicious prefix sharing, and applies selective isolation only where necessary
  • Achieves up to 70% higher cache reuse and 30% lower inference latency compared to user-isolation defenses that fully disable APC

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
llm serving systemsmulti-tenant cloud inference