ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models
Mitchell Piehl 1, Zhaohan Xi 2, Zuobin Xiong 3, Pan He 4, Muchao Ye 1
Published on arXiv
2602.15344
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Similarity-based retrieval in long-term memory-augmented LLMs constitutes a universal, system-level vulnerability exploitable under black-box access, with adversarially injected memories severely impairing multi-hop and temporal reasoning across diverse LLM and memory system configurations.
ER-MIA
Novel technique introduced
Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate that similarity-based retrieval constitutes a fundamental and system-level vulnerability, revealing security risks that persist across memory designs and application scenarios.
Key Contributions
- First systematic study of black-box adversarial memory injection attacks (AMIAs) against similarity-based retrieval in dynamic long-term memory-augmented LLMs, under realistic minimal-attacker-knowledge assumptions
- ER-MIA framework formalizing two attack settings (content-based and question-targeted) with composable attack primitives and ensemble attacks requiring no access to model parameters or retrieval system internals
- Empirical demonstration that similarity-based retrieval is a fundamental, system-level vulnerability that persists across multiple LLM architectures and long-term memory system designs
🛡️ Threat Analysis
ER-MIA crafts adversarial content specifically engineered to be embedding-close to legitimate memories, exploiting the similarity-based retrieval mechanism — this is adversarial content manipulation targeting an LLM-integrated retrieval system (analogous to adversarial RAG poisoning), which the guidelines explicitly include as ML01.