defense 2025

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

Peiran Wang , Yang Liu , Yunfei Lu , Yifeng Cai , Hongbo Chen , Qingyou Yang , Jie Zhang , Jue Hong , Ye Wu

0 citations

α

Published on arXiv

2508.01249

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Reduces agent attack success rate to 3% on AgentDojo with only 1% utility drop, outperforming prior prompt-injection defenses.

AgentArmor

Novel technique introduced


Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools. However, their dynamic and non-transparent behavior introduces critical security risks, particularly in the presence of prompt injection attacks. In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics. Thus, we present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations (e.g., CFG, DFG, and PDG) and enforces security policies via a type system. AgentArmor consists of three key components: (1) a graph constructor that reconstructs the agent's runtime traces as graph-based intermediate representations with control and data flow described within; (2) a property registry that attaches security-relevant metadata of interacted tools \& data, and (3) a type system that performs static inference and checking over the intermediate representation. By representing agent behavior as structured programs, AgentArmor enables program analysis for sensitive data flow, trust boundaries, and policy violations. We evaluate AgentArmor on the AgentDojo benchmark, the results show that AgentArmor can reduce the ASR to 3\%, with the utility drop only 1\%.


Key Contributions

  • Graph constructor that converts LLM agent runtime traces into CFG/DFG/PDG intermediate representations capturing control and data flow.
  • Property registry that attaches security-relevant metadata (trust labels, sensitivity) to tools and data interacted with by the agent.
  • Type system performing static inference and policy checking over the graph IR to detect prompt injection attempts and sensitive data-flow violations.

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Datasets
AgentDojo
Applications
llm agentstool-using ai agents