benchmark 2025

Systematic Analysis of MCP Security

0 citations

Published on arXiv

2508.12538

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Empirical evaluation of 31 MCP attack methods reveals LLM agents are especially vulnerable to file-based injection, chain attacks via shared context, and sycophantic compliance with malicious tool descriptions

MCPLIB

Novel technique introduced

The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools, significantly enhancing their functionality. However, while MCP brings notable benefits, it also introduces significant vulnerabilities, such as Tool Poisoning Attacks (TPA), where hidden malicious instructions exploit the sycophancy of large language models (LLMs) to manipulate agent behavior. Despite these risks, current academic research on MCP security remains limited, with most studies focusing on narrow or qualitative analyses that fail to capture the diversity of real-world threats. To address this gap, we present the MCP Attack Library (MCPLIB), which categorizes and implements 31 distinct attack methods under four key classifications: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attack. We further conduct a quantitative analysis of the efficacy of each attack. Our experiments reveal key insights into MCP vulnerabilities, including agents' blind reliance on tool descriptions, sensitivity to file-based attacks, chain attacks exploiting shared context, and difficulty distinguishing external data from executable commands. These insights, validated through attack experiments, underscore the urgency for robust defense strategies and informed MCP design. Our contributions include 1) constructing a comprehensive MCP attack taxonomy, 2) introducing a unified attack framework MCPLIB, and 3) conducting empirical vulnerability analysis to enhance MCP security mechanisms. This work provides a foundational framework, supporting the secure evolution of MCP ecosystems.

Key Contributions

MCP attack taxonomy covering 31 distinct attack methods across four categories: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attacks
MCPLIB — a unified, open attack framework that implements all 31 attacks for reproducible empirical evaluation of MCP vulnerabilities
Quantitative vulnerability analysis revealing agents' blind reliance on tool descriptions, sensitivity to file-based attacks, and inability to distinguish external data from executable commands

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

llm agentsai agent tool usemcp-enabled applications

Read PDF arXiv

Systematic Analysis of MCP Security

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Quantifying Distributional Robustness of Agentic Tool-Selection

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

Are AI-assisted Development Tools Immune to Prompt Injection?