attack 2026

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

Ruiqi Li , Zhiqiang Wang , Yunhao Yao , Xiang-Yang Li

1 citations · 40 references · arXiv

α

Published on arXiv

2601.07395

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MCP-ITP achieves up to 84.2% Attack Success Rate while suppressing Malicious Tool Detection Rate to as low as 0.3% across 12 LLM agents, significantly outperforming manually crafted baselines.

MCP-ITP

Novel technique introduced


To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely adopted. However, integrating external tools expands the attack surface, exposing agents to tool poisoning attacks. In such attacks, malicious instructions embedded in tool metadata are injected into the agent context during MCP registration phase, thereby manipulating agent behavior. Prior work primarily focuses on explicit tool poisoning or relied on manually crafted poisoned tools. In contrast, we focus on a particularly stealthy variant: implicit tool poisoning, where the poisoned tool itself remains uninvoked. Instead, the instructions embedded in the tool metadata induce the agent to invoke a legitimate but high-privilege tool to perform malicious operations. We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem. MCP-ITP formulates poisoned tool generation as a black-box optimization problem and employs an iterative optimization strategy that leverages feedback from both an evaluation LLM and a detection LLM to maximize Attack Success Rate (ASR) while evading current detection mechanisms. Experimental results on the MCPTox dataset across 12 LLM agents demonstrate that MCP-ITP consistently outperforms the manually crafted baseline, achieving up to 84.2% ASR while suppressing the Malicious Tool Detection Rate (MDR) to as low as 0.3%.


Key Contributions

  • First systematic investigation of implicit tool poisoning in MCP, where the poisoned tool is never invoked but its metadata hijacks agent reasoning to call a high-privilege legitimate tool
  • MCP-ITP: an automated black-box optimization framework using an attacker LLM, detector LLM, and evaluator LLM in an adversarial feedback loop to maximize ASR while minimizing detection rate
  • Comprehensive evaluation across 12 LLM agents on the MCPTox dataset, achieving up to 84.2% ASR and suppressing malicious tool detection rate to as low as 0.3%

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
MCPTox
Applications
llm agentsmcp-based tool ecosystemsagentic ai systems