defense 2026

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang

Van Lang University

0 citations

Published on arXiv

2604.26274

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves 2.2% attack success rate on structured workflows vs 12.8% for Aegis baseline, with 0% ASR on context-sequential attacks and 2.2ms per-call latency

Guardrails (behavioral firewall using pDFA)

Novel technique introduced

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.

Key Contributions

Telemetry-driven behavioral firewall that compiles benign tool-call traces into a parameterized DFA for stateful sequence validation
O(1) runtime enforcement via pre-compiled state-transition lookup, achieving 3.7x speedup over stateless baselines
Achieves 2.2% ASR on structured workflows (vs 12.8% for Aegis) and 0% ASR on multi-step context-sequential attacks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

Agent Security Bench (ASB)

Applications

ai agent tool callingmcp-based agentsclinical reasoning workflows

Read PDF arXiv

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Securing AI Agent Execution

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

Tracking Capabilities for Safer Agents

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents

Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection