defense 2026

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Nadya Abaev , Denis Klimov , Gerard Levinov , David Mimran , Yuval Elovici , Asaf Shabtai

3 citations · 40 references · arXiv

α

Published on arXiv

2601.10440

Excessive Agency

OWASP LLM Top 10 — LLM08

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AgentGuardian effectively detects malicious or misleading inputs and unauthorized execution paths across two real-world AI agent applications while preserving normal agent functionality.

AgentGuardian

Novel technique introduced


Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.


Key Contributions

  • Context-aware ABAC policy learning from agent execution traces during a controlled staging phase, with semantic input generalization/clustering for practical scalability
  • Control flow graph (CFG)-based governance mechanism that restricts agent execution to verified trajectories and detects unauthorized or anomalous tool-call sequences
  • Mitigation of both adversarial (malicious input) and non-adversarial (hallucination, orchestration errors) agent failures under a unified framework

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timedigital
Applications
ai agentstool-using llm systemsautonomous task automation