survey 2026

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Peiran Wang 1, Xinfeng Li 2, Chong Xiang 3, Jinghuai Zhang 1, Ying Li 1, Lixia Zhang 1, XiaoFeng Wang 2, Yuan Tian 1

0 citations · arXiv (Cornell University)

α

Published on arXiv

2602.10453

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Many existing defenses appear effective on current benchmarks by suppressing contextual inputs, but fail to generalize to realistic agent settings where context-dependent reasoning is essential, and no single defense achieves the trustworthiness-utility-latency triad.

AgentPI

Novel technique introduced


The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview of the PI landscape, covering attacks, defenses, and their evaluation practices. Through a systematic literature review and quantitative analysis, we establish taxonomies that categorize PI attacks by payload generation strategies (heuristic vs. optimization) and defenses by intervention stages (text, model, and execution levels). Our analysis reveals a key limitation shared by many existing defenses and benchmarks: they largely overlook context-dependent tasks, in which agents are authorized to rely on runtime environmental observations to determine actions. To address this gap, we introduce AgentPI, a new benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings. Using AgentPI, we empirically evaluate representative defenses and show that no single approach can simultaneously achieve high trustworthiness, high utility, and low latency. Moreover, we show that many defenses appear effective under existing benchmarks by suppressing contextual inputs, yet fail to generalize to realistic agent settings where context-dependent reasoning is essential. This SoK distills key takeaways and open research problems, offering structured guidance for future research and practical deployment of secure LLM agents.


Key Contributions

  • Comprehensive taxonomy of PI attacks categorized by payload generation strategy (heuristic vs. optimization) and defenses by intervention stage (text, model, execution levels)
  • AgentPI benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings, exposing a critical blind spot in existing benchmarks
  • Empirical finding that no single defense simultaneously achieves high trustworthiness, high utility, and low latency, and that many defenses appear effective only by suppressing contextual inputs rather than handling PI robustly

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timetargetedblack_box
Datasets
AgentPI
Applications
llm agentsautonomous ai agents