α

Published on arXiv

2604.09747

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Achieves up to 100% attack success rate extracting sensitive information from LLM agent memory, substantially outperforming state-of-the-art privacy attacks

ADAM

Novel technique introduced


Large Language Model (LLM) agents have achieved rapid adoption and demonstrated remarkable capabilities across a wide range of applications. To improve reasoning and task execution, modern LLM agents would incorporate memory modules or retrieval-augmented generation (RAG) mechanisms, enabling them to further leverage prior interactions or external knowledge. However, such a design also introduces a group of critical privacy vulnerabilities: sensitive information stored in memory can be leaked through query-based attacks. Although feasible, existing attacks often achieve only limited performance, with low attack success rates (ASR). In this paper, we propose ADAM, a novel privacy attack that features data distribution estimation of a victim agent's memory and employs an entropy-guided query strategy for maximizing privacy leakage. Extensive experiments demonstrate that our attack substantially outperforms state-of-the-art ones, achieving up to 100% ASRs. These results thus underscore the urgent need for robust privacy-preserving methods for current LLM agents.


Key Contributions

  • Novel adaptive querying attack (ADAM) that estimates data distribution in agent memory and uses entropy-guided strategy
  • Achieves up to 100% attack success rate, substantially outperforming prior privacy attacks on LLM agents
  • Demonstrates critical privacy vulnerabilities in memory-augmented and RAG-based LLM agent architectures

🛡️ Threat Analysis

Model Inversion Attack

The attack extracts sensitive private information stored in agent memory/RAG systems by querying the agent — this is model inversion/data extraction from a deployed system. The adversary reconstructs training/stored data through strategic queries.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
llm agents with memoryretrieval-augmented generation systems