attack 2026

RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Ziye Wang 1, Guanyu Wang 2, Kailong Wang 1

0 citations

α

Published on arXiv

2604.07403

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

Achieves 90% attack success rate on NQ with lowest grammar errors and repetition rates, successfully transferring to black-box victim systems

RefineRAG

Novel technique introduced


Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse-grained separate-and-concatenate strategies. To bridge this gap, we propose RefineRAG, a novel framework that treats poisoning as a holistic word-level refinement problem. It operates in two stages: Macro Generation produces toxic seeds guaranteed to induce target answers, while Micro Refinement employs a retriever-in-the-loop optimization to maximize retrieval priority without compromising naturalness. Evaluations on NQ and MSMARCO demonstrate that RefineRAG achieves state-of-the-art effectiveness, securing a 90% Attack Success Rate on NQ, while registering the lowest grammar errors and repetition rates among all baselines. Crucially, our proxy-optimized attacks successfully transfer to black-box victim systems, highlighting a severe practical threat.


Key Contributions

  • Two-stage word-level refinement framework for RAG poisoning: Macro Generation creates toxic seed texts, Micro Refinement uses retriever-in-the-loop optimization
  • Achieves 90% attack success rate on NQ while maintaining naturalness (lowest grammar errors and repetition rates)
  • Demonstrates transferability from white-box proxy optimization to black-box victim RAG systems

🛡️ Threat Analysis

Prompt Injection

The poisoned documents manipulate LLM behavior to generate misinformation when retrieved as context. While the attack vector is data poisoning, the goal is to hijack LLM generation through crafted context, which is a form of indirect prompt injection via the RAG pipeline.

Data Poisoning Attack

The paper attacks RAG systems by poisoning the external knowledge corpus (training/retrieval data) with adversarial documents designed to be retrieved and cause the LLM to generate incorrect answers. This is data poisoning targeting the retrieval component of the RAG pipeline.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
NQMSMARCO
Applications
question answeringretrieval-augmented generation