tool 2026

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

Zhuoran Tan 1, Run Hao 2, Jeremy Singer 1, Yutian Tang 1, Christos Anagnostopoulos 1

0 citations · 18 references · arXiv

α

Published on arXiv

2601.01241

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MCP-SandboxScan successfully surfaces external-to-sink provenance evidence and filesystem capability violations across three representative MCP tool case studies, capturing runtime behaviors invisible to static string-signature scanning.

MCP-SandboxScan

Novel technique introduced


Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners often focus on static artifacts, analyzing runtime behavior is challenging because directly executing untrusted tools can itself be dangerous. We present MCP-SandboxScan, a lightweight framework motivated by the Model Context Protocol (MCP) that safely executes untrusted tools inside a WebAssembly/WASI sandbox and produces auditable reports of external-to-sink exposures. Our prototype (i) extracts LLM-relevant sinks from runtime outputs (prompt/messages and structured tool-return fields), (ii) instantiates external-input candidates from environment values, mounted file contents, and output-surfaced HTTP fetch intents, and (iii) links sources to sinks via snippet-based substring matching. Case studies on three representative tools show that MCP-SandboxScan can surface provenance evidence when external inputs appear in prompt/messages or tool-return payloads, and can expose filesystem capability violations as runtime evidence. We further compare against a lightweight static string-signature baseline and use a micro-benchmark to characterize false negatives under transformations and false positives from short-token collisions.


Key Contributions

  • Identifies safe execution as a missing primitive for MCP tool security and proposes a WASM/WASI sandbox to safely run untrusted MCP tools without exposing the host
  • Models agent risk as external-to-sink data flow (environment secrets and file contents appearing in LLM prompt/message or tool-return payloads) and implements runtime taint-like tracking via substring matching
  • Validates the approach with three case studies and a micro-benchmark comparing runtime detection against a static string-signature baseline, characterizing false negative and false positive behaviors

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
llm agentsmcp tool executionplugin security auditing