attack 2025

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen 1, Yu He 1, Yan Wang 2, Shuo Shao 1, Haolun Zheng 1, Zhihao Liu 1, Jinfeng Li 2, Zhizhen Qin 3, Yuefeng Chen 2, Zhixuan Chu 1, Zhan Qin 1, Kui Ren 1

0 citations · 103 references · arXiv

α

Published on arXiv

2512.08289

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MIRAGE significantly outperforms existing RAG poisoning baselines in both attack efficacy and stealthiness, with remarkable transferability across diverse retriever-LLM configurations under strict black-box, query-agnostic conditions.

MIRAGE

Novel technique introduced


Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of Test-Time Preference Optimization (TPO) to maximize persuasion. To rigorously evaluate this threat, we construct a new benchmark derived from three long-form, domain-specific datasets. Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness, exhibiting remarkable transferability across diverse retriever-LLM configurations and highlighting the urgent need for robust defense strategies.


Key Contributions

  • MIRAGE: a multi-stage, black-box, query-agnostic RAG corpus poisoning pipeline that requires no white-box model access or prior knowledge of user queries
  • Three integrated mechanisms: persona-driven query synthesis to approximate user intent distributions, semantic anchoring for high retrieval visibility, and adversarial Test-Time Preference Optimization (TPO) to maximize persuasion of injected content
  • A new evaluation benchmark derived from three long-form domain-specific datasets, with experiments showing superior attack efficacy, stealthiness, and cross-retriever-LLM transferability over existing baselines

🛡️ Threat Analysis

Data Poisoning Attack

MIRAGE is fundamentally a corpus poisoning attack — it injects adversarially crafted documents into the RAG knowledge base before deployment to corrupt the data the system retrieves and relies upon, fitting the data poisoning threat model directly.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxtraining_timeinference_timetargeteddigital
Datasets
three long-form domain-specific datasets (constructed benchmark)
Applications
retrieval-augmented generation systemsknowledge-intensive question answeringdomain-specific llm assistants