CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

Model Context Protocol (MCP) is a rapidly adopted standard for defining and invoking external tools in LLM applications. The multi-layered architecture of MCP introduces new attack surfaces such as tool poisoning, in addition to traditional prompt injection. Existing defense systems suffer from limitations including high false positive rates, API dependency, or white-box access requirements. In this study, we propose CASCADE, a three-tiered cascaded defense architecture for MCP-based systems: (i) Layer 1 performs fast pre-filtering using regex, phrase weighting, and entropy analysis; (ii) Layer 2 conducts semantic analysis via BGE embedding with an Ollama Llama3 fallback mechanism; (iii) Layer 3 applies pattern-based output filtering. Evaluation on a dataset of 5,000 samples yielded 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score. Analysis across 31 attack types categorized into 6 tiers revealed high detection rates for data exfiltration (91.5%) and prompt injection (84.2%), while semantic attack (52.5%) and tool poisoning (59.9%) categories showed potential for improvement. A key advantage of CASCADE over existing solutions is its fully local operation, requiring no external API calls

Key Contributions

Three-tiered cascaded defense architecture (regex/entropy → BGE embedding with Llama3 fallback → output filtering) for MCP-based systems
Embedding-first, LLM-fallback strategy enabling fully local operation without external API calls
Three-decision output mechanism (ALLOW/REVIEW/BLOCK) for human-in-the-loop handling of ambiguous cases

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

Custom multi-source dataset (5,000 samples: 1,521 benign, 3,479 malicious from GitHub Adversarial and other sources)

Applications

llm tool callingmcp-based ai agents

2026 2 cit.

100%