defense 2025

MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS

Kaixiang Wang , Zhaojiacheng Zhou , Bunyod Suvonov , Jiong Lou , Jie LI

Shanghai Jiao Tong University

1 citations · 34 references · arXiv

Published on arXiv

2511.22924

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

MAS-Shield achieves a 92.5% recovery rate against adversarial linguistic attacks while reducing defense latency by over 70% compared to existing committee-based methods.

MAS-Shield

Novel technique introduced

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the network. Existing defenses face a fundamental dilemma: lightweight single-auditor methods are prone to single points of failure, while robust committee-based approaches incur prohibitive computational costs in multi-turn interactions. To address this challenge, we propose \textbf{MAS-Shield}, a secure and efficient defense framework designed with a coarse-to-fine filtering pipeline. Rather than applying uniform scrutiny, MAS-Shield dynamically allocates defense resources through a three-stage protocol: (1) \textbf{Critical Agent Selection } strategically targets high-influence nodes to narrow the defense surface; (2) \textbf{Light Auditing} employs lightweight sentry models to rapidly filter the majority of benign cases; and (3) \textbf{Global Consensus Auditing} escalates only suspicious or ambiguous signals to a heavyweight committee for definitive arbitration. This hierarchical design effectively optimizes the security-efficiency trade-off. Experiments demonstrate that MAS-Shield achieves a 92.5\% recovery rate against diverse adversarial scenarios and reduces defense latency by over 70\% compared to existing methods.

Key Contributions

Three-stage coarse-to-fine defense pipeline (Critical Agent Selection → Light Auditing → Global Consensus Auditing) that concentrates resources on high-influence nodes
Lightweight sentry model stage that rapidly filters benign inter-agent messages before escalating only suspicious signals to a heavyweight committee, reducing auditing overhead by >70%
Empirical evaluation showing 92.5% system recovery rate against diverse adversarial linguistic attack scenarios in LLM-based MAS

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

llm multi-agent systemsmulti-agent communication security

Read PDF arXiv DOI Code

MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AgentWatcher: A Rule-based Prompt Injection Monitor

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

AI Kill Switch for malicious web-based LLM agent

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

AgenTRIM: Tool Risk Mitigation for Agentic AI

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection