survey arXiv Mar 11, 2026 · 28d ago
Juhee Kim, Xiaoyuan Liu, Zhun Wang et al. · University of California · Seoul National University +1 more
Surveys attacks and defenses across agentic LLM systems, covering prompt injection, insecure tool use, and excessive agency risks
Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
AI agents that combine large language models with non-AI system components are rapidly emerging in real-world applications, offering unprecedented automation and flexibility. However, this unprecedented flexibility introduces complex security challenges fundamentally different from those in traditional software systems. This paper presents the first systematic and comprehensive survey of AI agent security, including an analysis of the design space, attack landscape, and defense mechanisms for secure AI agent systems. We further conduct case studies to point out existing gaps in securing agentic AI systems and identify open challenges in this emerging domain. Our work also introduces the first systematic framework for understanding the security risks and defense strategies of AI agents, serving as a foundation for building both secure agentic systems and advancing research in this critical area.
llm vlm multimodal University of California · Seoul National University · University of Illinois Urbana-Champaign
defense arXiv Apr 6, 2026 · 2d ago
Zhuowen Yuan, Zhaorun Chen, Zhen Xiang et al. · University of Illinois Urbana-Champaign · Virtue AI +6 more
Network-level guardrail detecting supply-chain poisoning in LLM agent MCP tools via MITM proxy monitoring network behaviors
AI Supply Chain Attacks Insecure Plugin Design nlp
Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on third-party tools and MCP servers, a new class of supply-chain threats has emerged, where malicious behaviors are embedded in seemingly benign tools, silently hijacking agent execution, leaking sensitive data, or triggering unauthorized actions. Despite their growing impact, there is currently no comprehensive benchmark for evaluating such threats. To bridge this gap, we introduce SC-Inject-Bench, a large-scale benchmark comprising over 10,000 malicious MCP tools grounded in a taxonomy of 25+ attack types derived from MITRE ATT&CK targeting supply-chain threats. We observe that existing MCP scanners and semantic guardrails perform poorly on this benchmark. Motivated by this finding, we propose ShieldNet, a network-level guardrail framework that detects supply-chain poisoning by observing real network interactions rather than surface-level tool traces. ShieldNet integrates a man-in-the-middle (MITM) proxy and an event extractor to identify critical network behaviors, which are then processed by a lightweight classifier for attack detection. Extensive experiments show that ShieldNet achieves strong detection performance (up to 0.995 F-1 with only 0.8% false positives) while introducing little runtime overhead, substantially outperforming existing MCP scanners and LLM-based guardrails.
llm University of Illinois Urbana-Champaign · Virtue AI · University of Chicago +5 more