A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

Graph-based Retrieval-Augmented Generation (GraphRAG) has recently emerged as a promising paradigm for enhancing large language models (LLMs) by converting raw text into structured knowledge graphs, improving both accuracy and explainability. However, GraphRAG relies on LLMs to extract knowledge from raw text during graph construction, and this process can be maliciously manipulated to implant misleading information. Targeting this attack surface, we propose two knowledge poisoning attacks (KPAs) and demonstrate that modifying only a few words in the source text can significantly change the constructed graph, poison the GraphRAG, and severely mislead downstream reasoning. The first attack, named Targeted KPA (TKPA), utilizes graph-theoretic analysis to locate vulnerable nodes in the generated graphs and rewrites the corresponding narratives with LLMs, achieving precise control over specific question-answering (QA) outcomes with a success rate of 93.1\%, while keeping the poisoned text fluent and natural. The second attack, named Universal KPA (UKPA), exploits linguistic cues such as pronouns and dependency relations to disrupt the structural integrity of the generated graph by altering globally influential words. With fewer than 0.05\% of full text modified, the QA accuracy collapses from 95\% to 50\%. Furthermore, experiments show that state-of-the-art defense methods fail to detect these attacks, highlighting that securing GraphRAG pipelines against knowledge poisoning remains largely unexplored.

Key Contributions

TKPA: uses graph-theoretic centrality analysis to locate vulnerable nodes, then rewrites corresponding source passages to achieve 93.1% targeted QA manipulation success
UKPA: exploits linguistic cues (pronouns, dependency relations) to globally distort graph structure with <0.05% text modification, collapsing QA accuracy from 95% to 50%
Demonstrates that state-of-the-art defenses fail to detect either attack, exposing an underexplored manipulation-only attack surface for GraphRAG

🛡️ Threat Analysis

Data Poisoning Attack

Both attacks corrupt the source corpus used during graph construction — the attack vector is the data itself. Subtle word modifications poison the knowledge graph at indexing/construction time, degrading or redirecting downstream reasoning. This is data poisoning of the RAG system's knowledge base.

Details

Domains

nlpgraph

Model Types

llm

Threat Tags

training_timeblack_boxtargeted

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

Subliminal Signals in Preference Labels

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples