defense arXiv Jan 17, 2026 · 11w ago
Zimo Ji, Daoyuan Wu, Wenyuan Jiang et al. · Hong Kong University of Science and Technology · Lingnan University +3 more
Proposes SEAgent, a mandatory access control framework that blocks privilege escalation attacks in LLM agent tool use via information flow monitoring and ABAC policies
Prompt Injection Excessive Agency nlp
Large Language Model (LLM)-based agent systems are increasingly deployed for complex real-world tasks but remain vulnerable to natural language-based attacks that exploit over-privileged tool use. This paper aims to understand and mitigate such attacks through the lens of privilege escalation, defined as agent actions exceeding the least privilege required for a user's intended task. Based on a formal model of LLM agent systems, we identify novel privilege escalation scenarios, particularly in multi-agent systems, including a variant akin to the classic confused deputy problem. To defend against both known and newly demonstrated privilege escalation, we propose SEAgent, a mandatory access control (MAC) framework built upon attribute-based access control (ABAC). SEAgent monitors agent-tool interactions via an information flow graph and enforces customizable security policies based on entity attributes. Our evaluations show that SEAgent effectively blocks various privilege escalation while maintaining a low false positive rate and negligible system overhead. This demonstrates its robustness and adaptability in securing LLM-based agent systems.
llm Hong Kong University of Science and Technology · Lingnan University · ETH Zürich +2 more
attack arXiv Dec 30, 2025 · Dec 2025
Ruixuan Huang, Qingyue Wang, Hantao Huang et al. · Hong Kong University of Science and Technology · Nanyang Technological University
Black-box DoS attack exploits MoE router imbalance via repetitive token patterns, causing 3x latency spike on Mixtral-8x7B
Model Denial of Service nlp
Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top-$k$ experts, which creates computational bottlenecks on certain devices while forcing others to idle. This converts an efficiency mechanism into a denial-of-service attack vector, leading to violations of service-level agreements for time to first token. We propose RepetitionCurse, a low-cost black-box strategy to exploit this vulnerability. By identifying a universal flaw in MoE router behavior, RepetitionCurse constructs adversarial prompts using simple repetitive token patterns in a model-agnostic manner. On widely deployed MoE models like Mixtral-8x7B, our method increases end-to-end inference latency by 3.063x, degrading service availability significantly.
llm transformer Hong Kong University of Science and Technology · Nanyang Technological University
survey arXiv Nov 19, 2025 · Nov 2025
Zimo Ji, Xunguang Wang, Zongjie Li et al. · The Hong Kong University of Science and Technology · Zhejiang University of Technology +3 more
SoK paper taxonomizes IPI defenses for LLM agents, identifies six bypass root causes, and proposes three novel adaptive attacks
Prompt Injection nlp
Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (SoK), we present the first comprehensive analysis of IPI-centric defense frameworks. We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions. We then thoroughly assess the security and usability of representative defense frameworks. Through analysis of defensive failures in the assessment, we identify six root causes of defense circumvention. Based on these findings, we design three novel adaptive attacks that significantly improve attack success rates targeting specific frameworks, demonstrating the severity of the flaws in these defenses. Our paper provides a foundation and critical insights for the future development of more secure and usable IPI-centric agent defense frameworks.
llm The Hong Kong University of Science and Technology · Zhejiang University of Technology · Lingnan University +2 more