CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents
Zesen Liu , Zhixiang Zhang , Yuchong Xie , Dongdong She
Published on arXiv
2510.22963
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
CompressionAttack achieves an average attack success rate of up to 83% and 87% across two tasks against multiple LLMs, while remaining stealthy and transferable across compression modules.
CompressionAttack (HardCom / SoftCom)
Novel technique introduced
LLM-powered agents often use prompt compression to reduce inference costs, but this introduces a new security risk. Compression modules, which are optimized for efficiency rather than safety, can be manipulated by adversarial inputs, causing semantic drift and altering LLM behavior. This work identifies prompt compression as a novel attack surface and presents CompressionAttack, the first framework to exploit it. CompressionAttack includes two strategies: HardCom, which uses discrete adversarial edits for hard compression, and SoftCom, which performs latent-space perturbations for soft compression. Experiments on multiple LLMs show up to an average ASR of 83% and 87% in two tasks, while remaining highly stealthy and transferable. Case studies in three practical scenarios confirm real-world impact, and current defenses prove ineffective, highlighting the need for stronger protections.
Key Contributions
- Identifies prompt compression as a novel, previously unexplored attack surface in LLM-powered agent pipelines.
- Proposes HardCom (discrete adversarial edits for hard/extractive compression) and SoftCom (latent-space perturbations for soft/neural compression) as two complementary attack strategies.
- Demonstrates up to 83–87% average attack success rate across multiple LLMs and three practical deployment scenarios, with existing defenses proving ineffective.
🛡️ Threat Analysis
SoftCom performs gradient-based latent-space perturbations on the compression model (an ML component), and HardCom applies discrete adversarial edits — both are input manipulation attacks targeting the compression module as an ML inference-time attack surface, causing semantic drift that alters downstream LLM outputs.