MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS
Kaixiang Wang , Zhaojiacheng Zhou , Bunyod Suvonov , Jiong Lou , Jie LI
Published on arXiv
2511.22924
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
MAS-Shield achieves a 92.5% recovery rate against adversarial linguistic attacks while reducing defense latency by over 70% compared to existing committee-based methods.
MAS-Shield
Novel technique introduced
Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the network. Existing defenses face a fundamental dilemma: lightweight single-auditor methods are prone to single points of failure, while robust committee-based approaches incur prohibitive computational costs in multi-turn interactions. To address this challenge, we propose \textbf{MAS-Shield}, a secure and efficient defense framework designed with a coarse-to-fine filtering pipeline. Rather than applying uniform scrutiny, MAS-Shield dynamically allocates defense resources through a three-stage protocol: (1) \textbf{Critical Agent Selection } strategically targets high-influence nodes to narrow the defense surface; (2) \textbf{Light Auditing} employs lightweight sentry models to rapidly filter the majority of benign cases; and (3) \textbf{Global Consensus Auditing} escalates only suspicious or ambiguous signals to a heavyweight committee for definitive arbitration. This hierarchical design effectively optimizes the security-efficiency trade-off. Experiments demonstrate that MAS-Shield achieves a 92.5\% recovery rate against diverse adversarial scenarios and reduces defense latency by over 70\% compared to existing methods.
Key Contributions
- Three-stage coarse-to-fine defense pipeline (Critical Agent Selection → Light Auditing → Global Consensus Auditing) that concentrates resources on high-influence nodes
- Lightweight sentry model stage that rapidly filters benign inter-agent messages before escalating only suspicious signals to a heavyweight committee, reducing auditing overhead by >70%
- Empirical evaluation showing 92.5% system recovery rate against diverse adversarial linguistic attack scenarios in LLM-based MAS