defense 2026

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang , Zhijia Zhao , Bihuan Chen , Susheng Wu , Zhuotong Zhou , Yiheng Cao , Xin Hu , Xin Peng

0 citations

α

Published on arXiv

2604.01905

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Connor achieves F1-score of 94.6%, outperforming SOTA by 8.9-59.6%; identifies 2 malicious servers in real-world deployment

Connor

Novel technique introduced


The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.


Key Contributions

  • First component-centric PoC dataset of 114 malicious MCP servers covering single and multi-component attack chains
  • Connor: two-stage behavioral deviation detector combining pre-execution shell command analysis with in-execution behavioral trajectory monitoring
  • Empirical evaluation across 2 MCP hosts and 5 LLMs revealing component position and multi-component composition effects on attack success

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timetargeted
Datasets
114 malicious MCP server PoCs (authors' dataset)
Applications
llm tool-callingmcp server security