defense 2025

MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS

Kaixiang Wang , Zhaojiacheng Zhou , Bunyod Suvonov , Jiong Lou , Jie LI

1 citations · 34 references · arXiv

α

Published on arXiv

2511.22924

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

MAS-Shield achieves a 92.5% recovery rate against adversarial linguistic attacks while reducing defense latency by over 70% compared to existing committee-based methods.

MAS-Shield

Novel technique introduced


Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the network. Existing defenses face a fundamental dilemma: lightweight single-auditor methods are prone to single points of failure, while robust committee-based approaches incur prohibitive computational costs in multi-turn interactions. To address this challenge, we propose \textbf{MAS-Shield}, a secure and efficient defense framework designed with a coarse-to-fine filtering pipeline. Rather than applying uniform scrutiny, MAS-Shield dynamically allocates defense resources through a three-stage protocol: (1) \textbf{Critical Agent Selection } strategically targets high-influence nodes to narrow the defense surface; (2) \textbf{Light Auditing} employs lightweight sentry models to rapidly filter the majority of benign cases; and (3) \textbf{Global Consensus Auditing} escalates only suspicious or ambiguous signals to a heavyweight committee for definitive arbitration. This hierarchical design effectively optimizes the security-efficiency trade-off. Experiments demonstrate that MAS-Shield achieves a 92.5\% recovery rate against diverse adversarial scenarios and reduces defense latency by over 70\% compared to existing methods.


Key Contributions

  • Three-stage coarse-to-fine defense pipeline (Critical Agent Selection → Light Auditing → Global Consensus Auditing) that concentrates resources on high-influence nodes
  • Lightweight sentry model stage that rapidly filters benign inter-agent messages before escalating only suspicious signals to a heavyweight committee, reducing auditing overhead by >70%
  • Empirical evaluation showing 92.5% system recovery rate against diverse adversarial linguistic attack scenarios in LLM-based MAS

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
llm multi-agent systemsmulti-agent communication security