ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying
Xingyu Lyu 1, Jianfeng He 2,3, Ning Wang 4, Yidan Hu 5, Tao Li 6, Danjue Chen 7, Shixiong Li 1, Yimin Chen 1
Published on arXiv
2604.09747
Model Inversion Attack
OWASP ML Top 10 — ML03
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
Achieves up to 100% attack success rate extracting sensitive information from LLM agent memory, substantially outperforming state-of-the-art privacy attacks
ADAM
Novel technique introduced
Large Language Model (LLM) agents have achieved rapid adoption and demonstrated remarkable capabilities across a wide range of applications. To improve reasoning and task execution, modern LLM agents would incorporate memory modules or retrieval-augmented generation (RAG) mechanisms, enabling them to further leverage prior interactions or external knowledge. However, such a design also introduces a group of critical privacy vulnerabilities: sensitive information stored in memory can be leaked through query-based attacks. Although feasible, existing attacks often achieve only limited performance, with low attack success rates (ASR). In this paper, we propose ADAM, a novel privacy attack that features data distribution estimation of a victim agent's memory and employs an entropy-guided query strategy for maximizing privacy leakage. Extensive experiments demonstrate that our attack substantially outperforms state-of-the-art ones, achieving up to 100% ASRs. These results thus underscore the urgent need for robust privacy-preserving methods for current LLM agents.
Key Contributions
- Novel adaptive querying attack (ADAM) that estimates data distribution in agent memory and uses entropy-guided strategy
- Achieves up to 100% attack success rate, substantially outperforming prior privacy attacks on LLM agents
- Demonstrates critical privacy vulnerabilities in memory-augmented and RAG-based LLM agent architectures
🛡️ Threat Analysis
The attack extracts sensitive private information stored in agent memory/RAG systems by querying the agent — this is model inversion/data extraction from a deployed system. The adversary reconstructs training/stored data through strategic queries.