RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse-grained separate-and-concatenate strategies. To bridge this gap, we propose RefineRAG, a novel framework that treats poisoning as a holistic word-level refinement problem. It operates in two stages: Macro Generation produces toxic seeds guaranteed to induce target answers, while Micro Refinement employs a retriever-in-the-loop optimization to maximize retrieval priority without compromising naturalness. Evaluations on NQ and MSMARCO demonstrate that RefineRAG achieves state-of-the-art effectiveness, securing a 90% Attack Success Rate on NQ, while registering the lowest grammar errors and repetition rates among all baselines. Crucially, our proxy-optimized attacks successfully transfer to black-box victim systems, highlighting a severe practical threat.

Key Contributions

Two-stage word-level refinement framework for RAG poisoning: Macro Generation creates toxic seed texts, Micro Refinement uses retriever-in-the-loop optimization
Achieves 90% attack success rate on NQ while maintaining naturalness (lowest grammar errors and repetition rates)
Demonstrates transferability from white-box proxy optimization to black-box victim RAG systems

🛡️ Threat Analysis

Prompt Injection

The poisoned documents manipulate LLM behavior to generate misinformation when retrieved as context. While the attack vector is data poisoning, the goal is to hijack LLM generation through crafted context, which is a form of indirect prompt injection via the RAG pipeline.

Data Poisoning Attack

The paper attacks RAG systems by poisoning the external knowledge corpus (training/retrieval data) with adversarial documents designed to be retrieved and cause the LLM to generate incorrect answers. This is data poisoning targeting the retrieval component of the RAG pipeline.