defense 2025

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Saeid Jamshidi ¹, Kawser Wazed Nafi ¹, Arghavan Moradi Dakhel ¹, Negar Shahabi ², Foutse Khomh ¹, Naser Ezzati-Jivan ³

¹ Polytechnique Montréal

² Concordia University

³ Brock University

5 citations · 39 references · arXiv

Published on arXiv

2512.06556

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

GPT-4 blocks ~71% of unsafe tool calls and DeepSeek achieves 97% resilience against Shadowing attacks; the framework reduces unsafe invocations without requiring model fine-tuning.

MCP Layered Security Framework

Novel technique introduced

The Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates a largely overlooked security gap. Existing defenses focus on prompt-injection attacks and fail to address threats embedded in tool metadata, leaving MCP-based systems exposed to semantic manipulation. This work analyzes three classes of semantic attacks on MCP-integrated systems: (1) Tool Poisoning, where adversarial instructions are hidden in tool descriptors; (2) Shadowing, where trusted tools are indirectly compromised through contaminated shared context; and (3) Rug Pulls, where descriptors are altered after approval to subvert behavior. To counter these threats, we introduce a layered security framework with three components: RSA-based manifest signing to enforce descriptor integrity, LLM-on-LLM semantic vetting to detect suspicious tool definitions, and lightweight heuristic guardrails that block anomalous tool behavior at runtime. Through evaluation of GPT-4, DeepSeek, and Llama-3.5 across eight prompting strategies, we find that security performance varies widely by model architecture and reasoning method. GPT-4 blocks about 71 percent of unsafe tool calls, balancing latency and safety. DeepSeek shows the highest resilience to Shadowing attacks but with greater latency, while Llama-3.5 is fastest but least robust. Our results show that the proposed framework reduces unsafe tool invocation rates without model fine-tuning or internal modification.

Key Contributions

Taxonomy and analysis of three novel MCP-specific attack classes: Tool Poisoning, Shadowing, and Rug Pulls
Layered security framework combining RSA-based manifest signing, LLM-on-LLM semantic vetting, and heuristic runtime guardrails
Comparative evaluation of GPT-4, DeepSeek, and Llama-3.5 across eight prompting strategies under adversarial MCP conditions, revealing wide performance variation (GPT-4 ~71% block rate, DeepSeek 97% resilience against Shadowing)

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

llm agentsmulti-agent systemstool-augmented llmsagentic workflows

Read PDF arXiv DOI

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Introducing the Generative Application Firewall (GAF)

Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Defense Against Indirect Prompt Injection via Tool Result Parsing

Quantifying Conversation Drift in MCP via Latent Polytope

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI