attack 2025

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen ¹, Yu He ¹, Yan Wang ², Shuo Shao ¹, Haolun Zheng ¹, Zhihao Liu ¹, Jinfeng Li ², Zhizhen Qin ³, Yuefeng Chen ², Zhixuan Chu ¹, Zhan Qin ¹, Kui Ren ¹

¹ Zhejiang University

² Alibaba Group

³ Amazon

0 citations · 103 references · arXiv

Published on arXiv

2512.08289

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MIRAGE significantly outperforms existing RAG poisoning baselines in both attack efficacy and stealthiness, with remarkable transferability across diverse retriever-LLM configurations under strict black-box, query-agnostic conditions.

MIRAGE

Novel technique introduced

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of Test-Time Preference Optimization (TPO) to maximize persuasion. To rigorously evaluate this threat, we construct a new benchmark derived from three long-form, domain-specific datasets. Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness, exhibiting remarkable transferability across diverse retriever-LLM configurations and highlighting the urgent need for robust defense strategies.

Key Contributions

MIRAGE: a multi-stage, black-box, query-agnostic RAG corpus poisoning pipeline that requires no white-box model access or prior knowledge of user queries
Three integrated mechanisms: persona-driven query synthesis to approximate user intent distributions, semantic anchoring for high retrieval visibility, and adversarial Test-Time Preference Optimization (TPO) to maximize persuasion of injected content
A new evaluation benchmark derived from three long-form domain-specific datasets, with experiments showing superior attack efficacy, stealthiness, and cross-retriever-LLM transferability over existing baselines

🛡️ Threat Analysis

Data Poisoning Attack

MIRAGE is fundamentally a corpus poisoning attack — it injects adversarially crafted documents into the RAG knowledge base before deployment to corrupt the data the system retrieves and relies upon, fitting the data poisoning threat model directly.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxtraining_timeinference_timetargeteddigital

Datasets

three long-form domain-specific datasets (constructed benchmark)

Applications

retrieval-augmented generation systemsknowledge-intensive question answeringdomain-specific llm assistants

Read PDF arXiv DOI

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning