attack 2026

Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems

Shuhua Yang , Jiahao Zhang , Yilong Wang , Dongwon Lee , Suhang Wang

Pennsylvania State University

0 citations · 21 references · arXiv

Published on arXiv

2601.14662

Model Theft

OWASP ML Top 10 — ML05

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

AGEA recovers up to 90% of entities and relationships from GraphRAG systems under fixed query budgets, significantly outperforming prior attack baselines while maintaining high precision.

AGEA (Agentic Graph Extraction Attack)

Novel technique introduced

Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.

Key Contributions

AGEA framework: a novelty-guided exploration-exploitation strategy with external graph memory to efficiently extract hidden knowledge graphs from GraphRAG systems under strict query budgets
Two-stage extraction pipeline combining lightweight entity-relation discovery with LLM-based filtering to improve precision
Empirical demonstration that Microsoft-GraphRAG and LightRAG systems are highly vulnerable, recovering up to 90% of entities and relationships under realistic query limits

🛡️ Threat Analysis

Model Theft

AGEA steals the core IP of a GraphRAG system — its latent entity-relation knowledge graph — through adaptive black-box queries, directly paralleling model extraction attacks that reconstruct a system's learned structure and functionality via API queries.

Details

Domains

nlpgraph

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

medical datasetagriculture datasetliterary novels datasetMicrosoft-GraphRAGLightRAG

Applications

graphrag systemsknowledge graph-based question answeringretrieval-augmented generation

Read PDF arXiv DOI

Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses

Exposing Privacy Risks in Graph Retrieval-Augmented Generation

PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs

OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems

An Invariant Latent Space Perspective on Language Model Inversion

Stronger Re-identification Attacks through Reasoning and Aggregation

StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data