LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances the reasoning capabilities of Large Language Models (LLMs) by grounding their responses in structured knowledge graphs. Leveraging community detection and relation filtering techniques, GraphRAG systems demonstrate inherent resistance to traditional RAG attacks, such as text poisoning and prompt injection. However, in this paper, we find that the security of GraphRAG systems fundamentally relies on the topological integrity of the underlying graph, which can be undermined by implicitly corrupting the logical connections, without altering surface-level text semantics. To exploit this vulnerability, we propose \textsc{LogicPoison}, a novel attack framework that targets logical reasoning rather than injecting false contents. Specifically, \textsc{LogicPoison} employs a type-preserving entity swapping mechanism to perturb both global logic hubs for disrupting overall graph connectivity and query-specific reasoning bridges for severing essential multi-hop inference paths. This approach effectively reroutes valid reasoning into dead ends while maintaining surface-level textual plausibility. Comprehensive experiments across multiple benchmarks demonstrate that \textsc{LogicPoison} successfully bypasses GraphRAG's defenses, significantly degrading performance and outperforming state-of-the-art baselines in both effectiveness and stealth. Our code is available at \textcolor{blue}https://github.com/Jord8061/logicPoison.

Key Contributions

Novel graph topology poisoning attack (LogicPoison) that targets logical reasoning chains rather than injecting false content
Type-preserving entity swapping mechanism that disrupts global logic hubs and query-specific reasoning bridges
Demonstrates GraphRAG vulnerability to graph structure attacks while maintaining textual plausibility and bypassing existing defenses

🛡️ Threat Analysis

Input Manipulation Attack

Also qualifies as input manipulation since the attack crafts adversarial content (entity swaps in documents) that causes incorrect LLM outputs at inference time by disrupting retrieval.

Data Poisoning Attack

Corrupts training/knowledge base data (the knowledge graph) by injecting entity swaps that degrade system performance - this is data poisoning targeting the graph construction phase.