α

Published on arXiv

2508.12538

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Empirical evaluation of 31 MCP attack methods reveals LLM agents are especially vulnerable to file-based injection, chain attacks via shared context, and sycophantic compliance with malicious tool descriptions

MCPLIB

Novel technique introduced


The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools, significantly enhancing their functionality. However, while MCP brings notable benefits, it also introduces significant vulnerabilities, such as Tool Poisoning Attacks (TPA), where hidden malicious instructions exploit the sycophancy of large language models (LLMs) to manipulate agent behavior. Despite these risks, current academic research on MCP security remains limited, with most studies focusing on narrow or qualitative analyses that fail to capture the diversity of real-world threats. To address this gap, we present the MCP Attack Library (MCPLIB), which categorizes and implements 31 distinct attack methods under four key classifications: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attack. We further conduct a quantitative analysis of the efficacy of each attack. Our experiments reveal key insights into MCP vulnerabilities, including agents' blind reliance on tool descriptions, sensitivity to file-based attacks, chain attacks exploiting shared context, and difficulty distinguishing external data from executable commands. These insights, validated through attack experiments, underscore the urgency for robust defense strategies and informed MCP design. Our contributions include 1) constructing a comprehensive MCP attack taxonomy, 2) introducing a unified attack framework MCPLIB, and 3) conducting empirical vulnerability analysis to enhance MCP security mechanisms. This work provides a foundational framework, supporting the secure evolution of MCP ecosystems.


Key Contributions

  • MCP attack taxonomy covering 31 distinct attack methods across four categories: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attacks
  • MCPLIB — a unified, open attack framework that implements all 31 attacks for reproducible empirical evaluation of MCP vulnerabilities
  • Quantitative vulnerability analysis revealing agents' blind reliance on tool descriptions, sensitivity to file-based attacks, and inability to distinguish external data from executable commands

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
llm agentsai agent tool usemcp-enabled applications