MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks
Tailun Chen 1, Yu He 1, Yan Wang 2, Shuo Shao 1, Haolun Zheng 1, Zhihao Liu 1, Jinfeng Li 2, Zhizhen Qin 3, Yuefeng Chen 2, Zhixuan Chu 1, Zhan Qin 1, Kui Ren 1
Published on arXiv
2512.08289
Data Poisoning Attack
OWASP ML Top 10 — ML02
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
MIRAGE significantly outperforms existing RAG poisoning baselines in both attack efficacy and stealthiness, with remarkable transferability across diverse retriever-LLM configurations under strict black-box, query-agnostic conditions.
MIRAGE
Novel technique introduced
Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of Test-Time Preference Optimization (TPO) to maximize persuasion. To rigorously evaluate this threat, we construct a new benchmark derived from three long-form, domain-specific datasets. Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness, exhibiting remarkable transferability across diverse retriever-LLM configurations and highlighting the urgent need for robust defense strategies.
Key Contributions
- MIRAGE: a multi-stage, black-box, query-agnostic RAG corpus poisoning pipeline that requires no white-box model access or prior knowledge of user queries
- Three integrated mechanisms: persona-driven query synthesis to approximate user intent distributions, semantic anchoring for high retrieval visibility, and adversarial Test-Time Preference Optimization (TPO) to maximize persuasion of injected content
- A new evaluation benchmark derived from three long-form domain-specific datasets, with experiments showing superior attack efficacy, stealthiness, and cross-retriever-LLM transferability over existing baselines
🛡️ Threat Analysis
MIRAGE is fundamentally a corpus poisoning attack — it injects adversarially crafted documents into the RAG knowledge base before deployment to corrupt the data the system retrieves and relies upon, fitting the data poisoning threat model directly.