defense 2026

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Nadya Abaev , Denis Klimov , Gerard Levinov , David Mimran , Yuval Elovici , Asaf Shabtai

Ben Gurion University of the Negev

3 citations · 40 references · arXiv

Published on arXiv

2601.10440

Excessive Agency

OWASP LLM Top 10 — LLM08

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AgentGuardian effectively detects malicious or misleading inputs and unauthorized execution paths across two real-world AI agent applications while preserving normal agent functionality.

AgentGuardian

Novel technique introduced

Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.

Key Contributions

Context-aware ABAC policy learning from agent execution traces during a controlled staging phase, with semantic input generalization/clustering for practical scalability
Control flow graph (CFG)-based governance mechanism that restricts agent execution to verified trajectories and detects unauthorized or anomalous tool-call sequences
Mitigation of both adversarial (malicious input) and non-adversarial (hallucination, orchestration errors) agent failures under a unified framework

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timedigital

Applications

ai agentstool-using llm systemsautonomous task automation

Read PDF arXiv DOI

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

Policy Compiler for Secure Agentic Systems

A2AS: Agentic AI Runtime Security and Self-Defense

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability