attack 2025

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

Jiayi Wen 1, Tianxin Chen 1, Zhirun Zheng 2, Cheng Huang 1

0 citations

α

Published on arXiv

2508.04276

Data Poisoning Attack

OWASP ML Top 10 — ML02

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

TKPA achieves 93.1% targeted attack success rate; UKPA collapses QA accuracy from 95% to 50% by modifying fewer than 0.05% of words in the source corpus.

TKPA / UKPA (Knowledge Poisoning Attacks)

Novel technique introduced


Graph-based Retrieval-Augmented Generation (GraphRAG) has recently emerged as a promising paradigm for enhancing large language models (LLMs) by converting raw text into structured knowledge graphs, improving both accuracy and explainability. However, GraphRAG relies on LLMs to extract knowledge from raw text during graph construction, and this process can be maliciously manipulated to implant misleading information. Targeting this attack surface, we propose two knowledge poisoning attacks (KPAs) and demonstrate that modifying only a few words in the source text can significantly change the constructed graph, poison the GraphRAG, and severely mislead downstream reasoning. The first attack, named Targeted KPA (TKPA), utilizes graph-theoretic analysis to locate vulnerable nodes in the generated graphs and rewrites the corresponding narratives with LLMs, achieving precise control over specific question-answering (QA) outcomes with a success rate of 93.1\%, while keeping the poisoned text fluent and natural. The second attack, named Universal KPA (UKPA), exploits linguistic cues such as pronouns and dependency relations to disrupt the structural integrity of the generated graph by altering globally influential words. With fewer than 0.05\% of full text modified, the QA accuracy collapses from 95\% to 50\%. Furthermore, experiments show that state-of-the-art defense methods fail to detect these attacks, highlighting that securing GraphRAG pipelines against knowledge poisoning remains largely unexplored.


Key Contributions

  • TKPA: uses graph-theoretic centrality analysis to locate vulnerable nodes, then rewrites corresponding source passages to achieve 93.1% targeted QA manipulation success
  • UKPA: exploits linguistic cues (pronouns, dependency relations) to globally distort graph structure with <0.05% text modification, collapsing QA accuracy from 95% to 50%
  • Demonstrates that state-of-the-art defenses fail to detect either attack, exposing an underexplored manipulation-only attack surface for GraphRAG

🛡️ Threat Analysis

Data Poisoning Attack

Both attacks corrupt the source corpus used during graph construction — the attack vector is the data itself. Subtle word modifications poison the knowledge graph at indexing/construction time, degrading or redirecting downstream reasoning. This is data poisoning of the RAG system's knowledge base.


Details

Domains
nlpgraph
Model Types
llm
Threat Tags
training_timeblack_boxtargeted
Applications
question answeringgraph-based retrieval-augmented generation