A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
Jiayi Wen 1, Tianxin Chen 1, Zhirun Zheng 2, Cheng Huang 1
Published on arXiv
2508.04276
Data Poisoning Attack
OWASP ML Top 10 — ML02
Training Data Poisoning
OWASP LLM Top 10 — LLM03
Key Finding
TKPA achieves 93.1% targeted attack success rate; UKPA collapses QA accuracy from 95% to 50% by modifying fewer than 0.05% of words in the source corpus.
TKPA / UKPA (Knowledge Poisoning Attacks)
Novel technique introduced
Graph-based Retrieval-Augmented Generation (GraphRAG) has recently emerged as a promising paradigm for enhancing large language models (LLMs) by converting raw text into structured knowledge graphs, improving both accuracy and explainability. However, GraphRAG relies on LLMs to extract knowledge from raw text during graph construction, and this process can be maliciously manipulated to implant misleading information. Targeting this attack surface, we propose two knowledge poisoning attacks (KPAs) and demonstrate that modifying only a few words in the source text can significantly change the constructed graph, poison the GraphRAG, and severely mislead downstream reasoning. The first attack, named Targeted KPA (TKPA), utilizes graph-theoretic analysis to locate vulnerable nodes in the generated graphs and rewrites the corresponding narratives with LLMs, achieving precise control over specific question-answering (QA) outcomes with a success rate of 93.1\%, while keeping the poisoned text fluent and natural. The second attack, named Universal KPA (UKPA), exploits linguistic cues such as pronouns and dependency relations to disrupt the structural integrity of the generated graph by altering globally influential words. With fewer than 0.05\% of full text modified, the QA accuracy collapses from 95\% to 50\%. Furthermore, experiments show that state-of-the-art defense methods fail to detect these attacks, highlighting that securing GraphRAG pipelines against knowledge poisoning remains largely unexplored.
Key Contributions
- TKPA: uses graph-theoretic centrality analysis to locate vulnerable nodes, then rewrites corresponding source passages to achieve 93.1% targeted QA manipulation success
- UKPA: exploits linguistic cues (pronouns, dependency relations) to globally distort graph structure with <0.05% text modification, collapsing QA accuracy from 95% to 50%
- Demonstrates that state-of-the-art defenses fail to detect either attack, exposing an underexplored manipulation-only attack surface for GraphRAG
🛡️ Threat Analysis
Both attacks corrupt the source corpus used during graph construction — the attack vector is the data itself. Subtle word modifications poison the knowledge graph at indexing/construction time, degrading or redirecting downstream reasoning. This is data poisoning of the RAG system's knowledge base.