survey 2026

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Peiran Wang ¹, Xinfeng Li ², Chong Xiang ³, Jinghuai Zhang ¹, Ying Li ¹, Lixia Zhang ¹, XiaoFeng Wang ², Yuan Tian ¹

¹ UCLA

² NTU

³ NVIDIA

0 citations · arXiv (Cornell University)

Published on arXiv

2602.10453

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Many existing defenses appear effective on current benchmarks by suppressing contextual inputs, but fail to generalize to realistic agent settings where context-dependent reasoning is essential, and no single defense achieves the trustworthiness-utility-latency triad.

AgentPI

Novel technique introduced

The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview of the PI landscape, covering attacks, defenses, and their evaluation practices. Through a systematic literature review and quantitative analysis, we establish taxonomies that categorize PI attacks by payload generation strategies (heuristic vs. optimization) and defenses by intervention stages (text, model, and execution levels). Our analysis reveals a key limitation shared by many existing defenses and benchmarks: they largely overlook context-dependent tasks, in which agents are authorized to rely on runtime environmental observations to determine actions. To address this gap, we introduce AgentPI, a new benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings. Using AgentPI, we empirically evaluate representative defenses and show that no single approach can simultaneously achieve high trustworthiness, high utility, and low latency. Moreover, we show that many defenses appear effective under existing benchmarks by suppressing contextual inputs, yet fail to generalize to realistic agent settings where context-dependent reasoning is essential. This SoK distills key takeaways and open research problems, offering structured guidance for future research and practical deployment of secure LLM agents.

Key Contributions

Comprehensive taxonomy of PI attacks categorized by payload generation strategy (heuristic vs. optimization) and defenses by intervention stage (text, model, execution levels)
AgentPI benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings, exposing a critical blind spot in existing benchmarks
Empirical finding that no single defense simultaneously achieves high trustworthiness, high utility, and low latency, and that many defenses appear effective only by suppressing contextual inputs rather than handling PI robustly

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetargetedblack_box

Datasets

AgentPI

Applications

llm agentsautonomous ai agents

Read PDF arXiv DOI

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Cybersecurity AI: Hacking the AI Hackers via Prompt Injection

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory