Exposing Privacy Risks in Graph Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a powerful technique for enhancing Large Language Models (LLMs) with external, up-to-date knowledge. Graph RAG has emerged as an advanced paradigm that leverages graph-based knowledge structures to provide more coherent and contextually rich answers. However, the move from plain document retrieval to structured graph traversal introduces new, under-explored privacy risks. This paper investigates the data extraction vulnerabilities of the Graph RAG systems. We design and execute tailored data extraction attacks to probe their susceptibility to leaking both raw text and structured data, such as entities and their relationships. Our findings reveal a critical trade-off: while Graph RAG systems may reduce raw text leakage, they are significantly more vulnerable to the extraction of structured entity and relationship information. We also explore potential defense mechanisms to mitigate these novel attack surfaces. This work provides a foundational analysis of the unique privacy challenges in Graph RAG and offers insights for building more secure systems.

Key Contributions

Tailored data extraction attacks designed for Graph RAG systems that probe leakage of both raw text and structured data (entities and relationships)
Empirical finding of a critical trade-off: Graph RAG reduces raw text leakage compared to standard RAG but is significantly more vulnerable to structured entity/relationship extraction
Foundational analysis of Graph RAG-specific privacy attack surfaces alongside exploration of potential defense mechanisms