defense 2026

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying ¹, Haozheng Wang ¹, Jiangfan Liu ¹, Quanchen Zou ², Aishan Liu ¹, Jian Yang ¹, Yaodong Yang ³, Xianglong Liu ¹

¹ Beihang University

² 360 AI Security Lab

³ Peking University

0 citations

Published on arXiv

2604.24118

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Reduces attack success rate to 0.65% with only 1.45% average utility loss compared to no defense baseline

AgentVisor

Novel technique introduced

Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection attacks. Furthermore, we introduce a one-shot self-correction mechanism that transforms security violations into constructive feedback, enabling agents to recover from attacks. Extensive experiments show that AgentVisor reduces the attack success rate to 0.65%, achieving this strong defense while incurring only a 1.45% average decrease in utility relative to the No Defense scenario, demonstrating superior performance compared to existing defense methods.

Key Contributions

AgentVisor framework applying OS virtualization concepts (privilege separation, policy enforcement, exception injection) to LLM agent security
STI (Suitability, Taint, Integrity) audit protocol adapting OS security primitives to semantic space for systematic prompt injection defense
One-shot self-correction mechanism that converts security violations into feedback for agent recovery

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Applications

llm agentstool-using agentsworkflow automation

Read PDF arXiv

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Policy Compiler for Secure Agentic Systems

A2AS: Agentic AI Runtime Security and Self-Defense

Optimizing Agent Planning for Security and Autonomy