Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems
Shuhua Yang , Jiahao Zhang , Yilong Wang , Dongwon Lee , Suhang Wang
Published on arXiv
2601.14662
Model Theft
OWASP ML Top 10 — ML05
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
AGEA recovers up to 90% of entities and relationships from GraphRAG systems under fixed query budgets, significantly outperforming prior attack baselines while maintaining high precision.
AGEA (Agentic Graph Extraction Attack)
Novel technique introduced
Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.
Key Contributions
- AGEA framework: a novelty-guided exploration-exploitation strategy with external graph memory to efficiently extract hidden knowledge graphs from GraphRAG systems under strict query budgets
- Two-stage extraction pipeline combining lightweight entity-relation discovery with LLM-based filtering to improve precision
- Empirical demonstration that Microsoft-GraphRAG and LightRAG systems are highly vulnerable, recovering up to 90% of entities and relationships under realistic query limits
🛡️ Threat Analysis
AGEA steals the core IP of a GraphRAG system — its latent entity-relation knowledge graph — through adaptive black-box queries, directly paralleling model extraction attacks that reconstruct a system's learned structure and functionality via API queries.