defense 2026

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu 1, Eugene Ilyushin 1,2, Jie Ni 1, Min Zhu 1

0 citations

α

Published on arXiv

2604.17562

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Consistently improves robustness over baseline and text-level guardrail methods while maintaining competitive benign-task performance on ASB and InjecAgent benchmarks

SafeAgent

Novel technique introduced


Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and persistent context, making input-output filtering alone insufficient for reliable protection. This paper presents SafeAgent, a runtime security architecture that treats agent safety as a stateful decision problem over evolving interaction trajectories. The proposed design separates execution governance from semantic risk reasoning through two coordinated components: a runtime controller that mediates actions around the agent loop and a context-aware decision core that operates over persistent session state. The core is formalized as a context-aware advanced machine intelligence and instantiated through operators for risk encoding, utility-cost evaluation, consequence modeling, policy arbitration, and state synchronization. Experiments on Agent Security Bench (ASB) and InjecAgent show that SafeAgent consistently improves robustness over baseline and text-level guardrail methods while maintaining competitive benign-task performance. Ablation studies further show that recovery confidence and policy weighting determine distinct safety-utility operating points.


Key Contributions

  • Runtime controller architecture that mediates actions around the agent loop with stateful decision-making over interaction trajectories
  • Context-aware decision core formalized as advanced machine intelligence with risk encoding, utility-cost evaluation, and policy arbitration operators
  • Demonstrated robustness improvements on Agent Security Bench (ASB) and InjecAgent while maintaining benign-task performance

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Datasets
Agent Security Bench (ASB)InjecAgent
Applications
agentic ai systemsmulti-step llm workflowstool-calling agents