Optimizing Agent Planning for Security and Autonomy
Aashish Kolluri 1, Rishi Sharma 1,2, Manuel Costa 1, Boris Köpf 1, Tobias Nießen 3, Mark Russinovich 1, Shruti Tople 1, Santiago Zanella-Béguelin 1
Published on arXiv
2602.11416
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Security-aware agent planning (Prudentia) achieves higher autonomous action rates without sacrificing task completion utility, demonstrating that deterministic IFC defenses against prompt injection need not be prohibitively costly when paired with autonomy-aware planning.
Prudentia
Novel technique introduced
Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.
Key Contributions
- Autonomy metrics that quantify the fraction of consequential actions an agent can execute without human-in-the-loop approval while preserving security guarantees
- Prudentia: a security-aware agent design that combines richer HITL interactions with explicit planning for both task progress and policy compliance
- Empirical demonstration on AgentDojo and WASP that security-aware planning improves autonomy without sacrificing task utility under information-flow control defenses