tool 2026

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Run Hao ¹, Zhuoran Tan ²

¹ Aarhus University

² University of Glasgow

0 citations

Published on arXiv

2604.21477

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Benchmarks & Evaluation

LLMs for Security — LS10

Blue-Team Agents

LLMs for Security — LS07

Key Finding

Static analyzer eliminates all 29 Tier-1 findings with recommended hardening (mean 27 LOC cost) and reduces framework risk score from 10.0 to 0.0; detects 63.2% trace-narrative divergence in agent outputs

MCP Pitfall Lab

Novel technique introduced

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to malicious inputs but offer limited remediation guidance. We present MCP Pitfall Lab, a protocol-aware security testing framework that operationalizes developer pitfalls as reproducible scenarios and validates outcomes with MCP traces and objective validators (rather than agent self-report). We instantiate three workflow challenges (email, document, crypto) with six server variants (baseline and hardened) and model three attack families: tool-metadata poisoning, puppet servers, and multimodal image-to-tool chains, in a unified, trace-grounded evaluation. In Tier-1 static analysis over six variants (36 binary labels), our analyzer achieves F1 = 1.0 on four statically checkable pitfall classes (P1, P2, P5, P6) and flags cross-tool forwarding and image-to-tool leakage (P3, P4) as trace/dataflow-dependent. Applying recommended hardening eliminates all Tier-1 findings (29 to 0) and reduces the framework risk score (10.0 to 0.0) at a mean cost of 27 lines of code (LOC). Finally, in a preliminary 19-run corpus from the email system challenge (tool poisoning and puppet attacks), agent narratives diverge from trace evidence in 63.2% of runs and 100% of sink-action runs, motivating trace-based auditing and regression testing. Overall, Pitfall Lab enables practical, end-to-end assessment and hardening of MCP tool servers under realistic multi-vector conditions.

Key Contributions

MCP Pitfall Lab framework with 6-class pitfall taxonomy (P1-P6) distinguishing static-checkable vs trace-dependent vulnerabilities
Tier-1 static analyzer achieving F1=1.0 on 4 statically checkable pitfall classes with millisecond runtime for CI integration
Trace-grounded evaluation showing 63.2% agent narrative divergence from actual tool actions, motivating objective validation over self-report

🛡️ Threat Analysis

AI Supply Chain Attacks

Paper addresses supply-chain security of MCP tool servers and third-party skill registries, detecting malicious/compromised tool components before deployment — explicitly mentions supply-chain risks, manipulated registries, and untrusted third-party components in the MCP ecosystem.

Details

Domains

multimodalnlp

Model Types

llmmultimodal

Threat Tags

inference_timeblack_box

Datasets

19-run email system corpus (tool poisoning + puppet attacks)

Applications

llm agent tool integrationmcp server securityai agent pipeline auditing

Read PDF arXiv

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents