tool 2026

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

Zhuoran Tan ¹, Run Hao ², Jeremy Singer ¹, Yutian Tang ¹, Christos Anagnostopoulos ¹

¹ University of Glasgow

² Aarhus University

0 citations · 18 references · arXiv

Published on arXiv

2601.01241

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MCP-SandboxScan successfully surfaces external-to-sink provenance evidence and filesystem capability violations across three representative MCP tool case studies, capturing runtime behaviors invisible to static string-signature scanning.

MCP-SandboxScan

Novel technique introduced

Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners often focus on static artifacts, analyzing runtime behavior is challenging because directly executing untrusted tools can itself be dangerous. We present MCP-SandboxScan, a lightweight framework motivated by the Model Context Protocol (MCP) that safely executes untrusted tools inside a WebAssembly/WASI sandbox and produces auditable reports of external-to-sink exposures. Our prototype (i) extracts LLM-relevant sinks from runtime outputs (prompt/messages and structured tool-return fields), (ii) instantiates external-input candidates from environment values, mounted file contents, and output-surfaced HTTP fetch intents, and (iii) links sources to sinks via snippet-based substring matching. Case studies on three representative tools show that MCP-SandboxScan can surface provenance evidence when external inputs appear in prompt/messages or tool-return payloads, and can expose filesystem capability violations as runtime evidence. We further compare against a lightweight static string-signature baseline and use a micro-benchmark to characterize false negatives under transformations and false positives from short-token collisions.

Key Contributions

Identifies safe execution as a missing primitive for MCP tool security and proposes a WASM/WASI sandbox to safely run untrusted MCP tools without exposing the host
Models agent risk as external-to-sink data flow (environment secrets and file contents appearing in LLM prompt/message or tool-return payloads) and implements runtime taint-like tracking via substring matching
Validates the approach with three case studies and a micro-benchmark comparing runtime detection against a static string-signature baseline, characterizing false negative and false positive behaviors

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

llm agentsmcp tool executionplugin security auditing

Read PDF arXiv DOI

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Introducing the Generative Application Firewall (GAF)

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Quantifying Distributional Robustness of Agentic Tool-Selection

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Systematic Analysis of MCP Security

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents