attack 2026

Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems

Shuhua Yang , Jiahao Zhang , Yilong Wang , Dongwon Lee , Suhang Wang

0 citations · 21 references · arXiv

α

Published on arXiv

2601.14662

Model Theft

OWASP ML Top 10 — ML05

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

AGEA recovers up to 90% of entities and relationships from GraphRAG systems under fixed query budgets, significantly outperforming prior attack baselines while maintaining high precision.

AGEA (Agentic Graph Extraction Attack)

Novel technique introduced


Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.


Key Contributions

  • AGEA framework: a novelty-guided exploration-exploitation strategy with external graph memory to efficiently extract hidden knowledge graphs from GraphRAG systems under strict query budgets
  • Two-stage extraction pipeline combining lightweight entity-relation discovery with LLM-based filtering to improve precision
  • Empirical demonstration that Microsoft-GraphRAG and LightRAG systems are highly vulnerable, recovering up to 90% of entities and relationships under realistic query limits

🛡️ Threat Analysis

Model Theft

AGEA steals the core IP of a GraphRAG system — its latent entity-relation knowledge graph — through adaptive black-box queries, directly paralleling model extraction attacks that reconstruct a system's learned structure and functionality via API queries.


Details

Domains
nlpgraph
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
medical datasetagriculture datasetliterary novels datasetMicrosoft-GraphRAGLightRAG
Applications
graphrag systemsknowledge graph-based question answeringretrieval-augmented generation