defense 2026

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang

0 citations

α

Published on arXiv

2604.26274

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves 2.2% attack success rate on structured workflows vs 12.8% for Aegis baseline, with 0% ASR on context-sequential attacks and 2.2ms per-call latency

Guardrails (behavioral firewall using pDFA)

Novel technique introduced


Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.


Key Contributions

  • Telemetry-driven behavioral firewall that compiles benign tool-call traces into a parameterized DFA for stateful sequence validation
  • O(1) runtime enforcement via pre-compiled state-transition lookup, achieving 3.7x speedup over stateless baselines
  • Achieves 2.2% ASR on structured workflows (vs 12.8% for Aegis) and 0% ASR on multi-step context-sequential attacks

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Datasets
Agent Security Bench (ASB)
Applications
ai agent tool callingmcp-based agentsclinical reasoning workflows