attack 2026

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

Hongyan Chang , Ergute Bao , Xinjian Luo , Ting Yu

Mohamed bin Zayed University of Artificial Intelligence

2 citations · 126 references · arXiv

Published on arXiv

2601.07072

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Near-100% malicious content retrieval rate across 11 benchmarks and 8 embedding models; a single poisoned email coerces GPT-4o into exfiltrating SSH keys with over 80% success in a multi-agent workflow.

Trigger-Fragment IPI

Novel technique introduced

Large language models (LLMs) increasingly rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once retrieved. Previous studies have highlighted this risk but often avoid the hardest step: ensuring that malicious content is actually retrieved. In practice, unoptimized IPI is rarely retrieved under natural queries, which leaves its real-world impact unclear. We address this challenge by decomposing the malicious content into a trigger fragment that guarantees retrieval and an attack fragment that encodes arbitrary attack objectives. Based on this idea, we design an efficient and effective black-box attack algorithm that constructs a compact trigger fragment to guarantee retrieval for any attack fragment. Our attack requires only API access to embedding models, is cost-efficient (as little as $0.21 per target user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this attack, we present the first end-to-end IPI exploits under natural queries and realistic external corpora, spanning both RAG and agentic systems with diverse attack objectives. These results establish IPI as a practical and severe threat: when a user issued a natural query to summarize emails on frequently asked topics, a single poisoned email was sufficient to coerce GPT-4o into exfiltrating SSH keys with over 80% success in a multi-agent workflow. We further evaluate several defenses and find that they are insufficient to prevent the retrieval of malicious text, highlighting retrieval as a critical open vulnerability.

Key Contributions

Decomposes malicious IPI content into a retrieval-optimizing 'trigger fragment' and a goal-encoding 'attack fragment', solving the retrieval barrier that made prior IPI impractical
Efficient black-box attack algorithm requiring only embedding API access (~$0.21/query) that achieves near-100% retrieval across 11 benchmarks and 8 embedding models
First end-to-end IPI exploits under natural queries in realistic RAG and agentic systems, including >80% SSH key exfiltration success in a multi-agent GPT-4o workflow

🛡️ Threat Analysis

Input Manipulation Attack

Paper designs an adversarial algorithm that crafts 'trigger fragments' — content optimized via black-box embedding API access to guarantee retrieval by the RAG system. This is adversarial document injection for RAG: inputs are strategically crafted to manipulate the retrieval system's output, which is explicitly called out in the ML01 dual-tagging guidance.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

11 benchmarks (unnamed in abstract)

Applications

rag systemsagentic ai systemsmulti-agent workflowsemail summarization

Read PDF arXiv DOI

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

Dynamic Target Attack

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models