defense 2025

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

0 citations

Published on arXiv

2508.10991

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

The E5-based semantic detection component of MCP-GUARD achieves 96.01% accuracy in identifying adversarial prompts targeting MCP-integrated LLM systems.

MCP-GUARD

Novel technique introduced

While Large Language Models (LLMs) have achieved remarkable performance, they remain vulnerable to jailbreak. The integration of Large Language Models (LLMs) with external tools via protocols such as the Model Context Protocol (MCP) introduces critical security vulnerabilities, including prompt injection, data exfiltration, and other threats. To counter these challenges, we propose MCP-GUARD, a robust, layered defense architecture designed for LLM-tool interactions. MCP-GUARD employs a three-stage detection pipeline that balances efficiency with accuracy: it progresses from lightweight static scanning for overt threats and a deep neural detector for semantic attacks, to our fine-tuned E5-based model which achieves 96.01\% accuracy in identifying adversarial prompts. Finally, an LLM arbitrator synthesizes these signals to deliver the final decision. To enable rigorous training and evaluation, we introduce MCP-ATTACKBENCH, a comprehensive benchmark comprising 70,448 samples augmented by GPT-4. This benchmark simulates diverse real-world attack vectors that circumvent conventional defenses in the MCP paradigm, thereby laying a solid foundation for future research on securing LLM-tool ecosystems.

Key Contributions

MCP-GUARD: a three-stage defense pipeline (static scanner → neural semantic detector → LLM arbitrator) for securing LLM-tool interactions over the Model Context Protocol
A fine-tuned E5-based adversarial prompt classifier achieving 96.01% accuracy on MCP-specific attack patterns
MCP-ATTACKBENCH: a GPT-4-augmented benchmark of 70,448 samples simulating diverse real-world attack vectors targeting MCP-enabled LLM ecosystems

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

MCP-ATTACKBENCH

Applications

llm-tool ecosystemsagentic ai systemsmodel context protocol integrations

Read PDF arXiv

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Quantifying Conversation Drift in MCP via Latent Polytope

Defense Against Indirect Prompt Injection via Tool Result Parsing

Introducing the Generative Application Firewall (GAF)

Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification