Exposing Privacy Risks in Graph Retrieval-Augmented Generation
Jiale Liu , Jiahao Zhang , Suhang Wang
Published on arXiv
2508.17222
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
Graph RAG systems are significantly more vulnerable to structured entity and relationship extraction than standard document-based RAG, despite offering reduced raw text leakage.
Retrieval-Augmented Generation (RAG) is a powerful technique for enhancing Large Language Models (LLMs) with external, up-to-date knowledge. Graph RAG has emerged as an advanced paradigm that leverages graph-based knowledge structures to provide more coherent and contextually rich answers. However, the move from plain document retrieval to structured graph traversal introduces new, under-explored privacy risks. This paper investigates the data extraction vulnerabilities of the Graph RAG systems. We design and execute tailored data extraction attacks to probe their susceptibility to leaking both raw text and structured data, such as entities and their relationships. Our findings reveal a critical trade-off: while Graph RAG systems may reduce raw text leakage, they are significantly more vulnerable to the extraction of structured entity and relationship information. We also explore potential defense mechanisms to mitigate these novel attack surfaces. This work provides a foundational analysis of the unique privacy challenges in Graph RAG and offers insights for building more secure systems.
Key Contributions
- Tailored data extraction attacks designed for Graph RAG systems that probe leakage of both raw text and structured data (entities and relationships)
- Empirical finding of a critical trade-off: Graph RAG reduces raw text leakage compared to standard RAG but is significantly more vulnerable to structured entity/relationship extraction
- Foundational analysis of Graph RAG-specific privacy attack surfaces alongside exploration of potential defense mechanisms