Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection
Yangyang Wei 1, Yijie Xu 1, Zhenyuan Li 1,1, Xiangmin Shen 2, Shouling Ji 1
Published on arXiv
2603.04469
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
MAScope achieves F1-scores of 85.3% and 66.7% for node-level and path-level detection of compound attack vectors, including indirect prompt injection, across multi-agent LLM systems.
MAScope
Novel technique introduced
Multi-Agent System is emerging as the \textit{de facto} standard for complex task orchestration. However, its reliance on autonomous execution and unstructured inter-agent communication introduces severe risks, such as indirect prompt injection, that easily circumvent conventional input guardrails. To address this, we propose \SysName, a framework that shifts the defensive paradigm from static input filtering to execution-aware analysis. By extracting and reconstructing Cross-Agent Semantic Flows, \SysName synthesizes fragmented operational primitives into contiguous behavioral trajectories, enabling a holistic view of system activity. We leverage a Supervisor LLM to scrutinize these trajectories, identifying anomalies across data flow violations, control flow deviations, and intent inconsistencies. Empirical evaluations demonstrate that \SysName effectively detects over ten distinct compound attack vectors, achieving F1-scores of 85.3\% and 66.7\% for node-level and path-level end-to-end attack detection, respectively. The source code is available at https://anonymous.4open.science/r/MAScope-71DC.
Key Contributions
- Cross-Agent Semantic Flow reconstruction that synthesizes fragmented inter-agent operational primitives into contiguous behavioral trajectories for holistic MAS monitoring
- Supervisor LLM-based anomaly detection across data flow violations, control flow deviations, and intent inconsistencies in multi-agent pipelines
- Empirical evaluation detecting over 10 distinct compound attack vectors with F1-scores of 85.3% (node-level) and 66.7% (path-level) for end-to-end attack detection