defense 2026

CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

İpek Abasıkeleş Turgut , Edip Gümüş

0 citations

α

Published on arXiv

2604.17125

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Achieves 95.85% precision and 6.06% FPR with 91.5% detection rate for data exfiltration and 84.2% for prompt injection attacks

CASCADE

Novel technique introduced


Model Context Protocol (MCP) is a rapidly adopted standard for defining and invoking external tools in LLM applications. The multi-layered architecture of MCP introduces new attack surfaces such as tool poisoning, in addition to traditional prompt injection. Existing defense systems suffer from limitations including high false positive rates, API dependency, or white-box access requirements. In this study, we propose CASCADE, a three-tiered cascaded defense architecture for MCP-based systems: (i) Layer 1 performs fast pre-filtering using regex, phrase weighting, and entropy analysis; (ii) Layer 2 conducts semantic analysis via BGE embedding with an Ollama Llama3 fallback mechanism; (iii) Layer 3 applies pattern-based output filtering. Evaluation on a dataset of 5,000 samples yielded 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score. Analysis across 31 attack types categorized into 6 tiers revealed high detection rates for data exfiltration (91.5%) and prompt injection (84.2%), while semantic attack (52.5%) and tool poisoning (59.9%) categories showed potential for improvement. A key advantage of CASCADE over existing solutions is its fully local operation, requiring no external API calls


Key Contributions

  • Three-tiered cascaded defense architecture (regex/entropy → BGE embedding with Llama3 fallback → output filtering) for MCP-based systems
  • Embedding-first, LLM-fallback strategy enabling fully local operation without external API calls
  • Three-decision output mechanism (ALLOW/REVIEW/BLOCK) for human-in-the-loop handling of ambiguous cases

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Datasets
Custom multi-source dataset (5,000 samples: 1,521 benign, 3,479 malicious from GitHub Adversarial and other sources)
Applications
llm tool callingmcp-based ai agents