defense 2025

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu ^1,2, Bonan Ruan ¹, Xianglin Yang ¹, Zhiwei Lin ^1,2, Yan Liu ², Yang Wang ², Tao Wei ², Zhenkai Liang ¹

¹ National University of Singapore

² Ant Group

2 citations · 33 references · arXiv

Published on arXiv

2510.11203

Excessive Agency

OWASP LLM Top 10 — LLM08

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

TraceAegis successfully identifies the majority of abnormal agent behaviors — both execution-order violations and semantic-consistency anomalies — across healthcare and corporate procurement scenarios

TraceAegis

Novel technique introduced

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.

Key Contributions

TraceAegis: a provenance-based runtime framework that abstracts agent execution into a hierarchical structure of stable units and derives behavioral rules to detect structural and semantic anomalies in LLM agent traces
TraceAegis-Bench: a benchmark dataset with 1,300 benign and 300 abnormal behaviors across healthcare and corporate procurement scenarios, covering execution-order and semantic-consistency violations
Practical validation via internal red-teaming at a technology company, demonstrating detection of real adversarial agent traces beyond the benchmark

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

TraceAegis-Bench

Applications

llm-based agentshealthcare ai systemscorporate procurement systems

Read PDF arXiv DOI

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents

Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

Tracking Capabilities for Safer Agents

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Securing AI Agent Execution

Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors

Authenticated Workflows: A Systems Approach to Protecting Agentic AI