attack 2025

Attacks on Approximate Caches in Text-to-Image Diffusion Models

Desen Sun , Shuncheng Jie , Sihang Liu

0 citations

α

Published on arXiv

2508.20424

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Three remote attacks on approximate caching in diffusion model serving systems are demonstrated, including prompt recovery from cache hits and persistent logo injection into generated images for victim users whose prompts match poisoned cache entries.


Diffusion models are a powerful class of generative models that produce images and other content from user prompts, but they are computationally intensive. To mitigate this cost, recent academic and industry work has adopted approximate caching, which reuses intermediate states from similar prompts in a cache. While efficient, this optimization introduces new security risks by breaking isolation among users. This paper provides a comprehensive assessment of the security vulnerabilities introduced by approximate caching. First, we demonstrate a remote covert channel established with the approximate cache, where a sender injects prompts with special keywords into the cache system and a receiver can recover that even after days, to exchange information. Second, we introduce a prompt stealing attack using the approximate cache, where an attacker can recover existing cached prompts from hits. Finally, we introduce a poisoning attack that embeds the attacker's logos into the previously stolen prompt, leading to unexpected logo rendering for the requests that hit the poisoned cache prompts. These attacks are all performed remotely through the serving system, demonstrating severe security vulnerabilities in approximate caching. The code for this work is available.


Key Contributions

  • Remote covert channel exploiting approximate cache hit/miss behavior to transmit information between users, persistent over days
  • Prompt stealing attack that recovers other users' cached text prompts by probing cache hit patterns
  • Cache poisoning attack that injects attacker logos into cached diffusion intermediate states, corrupting image outputs for subsequent matching prompts

🛡️ Threat Analysis

Output Integrity Attack

The poisoning attack directly corrupts image outputs by embedding attacker logos into cached intermediate states, manipulating what outputs users receive. The prompt stealing attack extracts other users' private prompts by observing cache-hit behavior (output-observable information leakage). The covert channel encodes information in observable cache-hit/miss patterns in model outputs. All three attacks compromise the integrity and isolation of model outputs through the serving infrastructure.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
black_boxinference_timetargeted
Applications
text-to-image generationdiffusion model serving systems