defense 2025

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu 1,2, Bonan Ruan 1, Xianglin Yang 1, Zhiwei Lin 1,2, Yan Liu 2, Yang Wang 2, Tao Wei 2, Zhenkai Liang 1

2 citations · 33 references · arXiv

α

Published on arXiv

2510.11203

Excessive Agency

OWASP LLM Top 10 — LLM08

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

TraceAegis successfully identifies the majority of abnormal agent behaviors — both execution-order violations and semantic-consistency anomalies — across healthcare and corporate procurement scenarios

TraceAegis

Novel technique introduced


LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.


Key Contributions

  • TraceAegis: a provenance-based runtime framework that abstracts agent execution into a hierarchical structure of stable units and derives behavioral rules to detect structural and semantic anomalies in LLM agent traces
  • TraceAegis-Bench: a benchmark dataset with 1,300 benign and 300 abnormal behaviors across healthcare and corporate procurement scenarios, covering execution-order and semantic-consistency violations
  • Practical validation via internal red-teaming at a technology company, demonstrating detection of real adversarial agent traces beyond the benchmark

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Datasets
TraceAegis-Bench
Applications
llm-based agentshealthcare ai systemscorporate procurement systems