defense 2025

LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

Tom Pawelek ¹, Raj Patel ², Charlotte Crowell ², Noorbakhsh Amiri ², Sudip Mittal ², Shahram Rahimi ², Andy Perkins ¹

¹ Mississippi State University

² The University of Alabama

4 citations · 1 influential · 29 references · arXiv

Published on arXiv

2509.18557

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

LLMZ+ achieves 0% false positive and 0% false negative rates against common jailbreak prompts while allowing legitimate business communications to pass unimpeded.

LLMZ+

Novel technique introduced

Compared to traditional models, agentic AI represents a highly valuable target for potential attackers as they possess privileged access to data sources and API tools, which are traditionally not incorporated into classical agents. Unlike a typical software application residing in a Demilitarized Zone (DMZ), agentic LLMs consciously rely on nondeterministic behavior of the AI (only defining a final goal, leaving the path selection to LLM). This characteristic introduces substantial security risk to both operational security and information security. Most common existing defense mechanism rely on detection of malicious intent and preventing it from reaching the LLM agent, thus protecting against jailbreak attacks such as prompt injection. In this paper, we present an alternative approach, LLMZ+, which moves beyond traditional detection-based approaches by implementing prompt whitelisting. Through this method, only contextually appropriate and safe messages are permitted to interact with the agentic LLM. By leveraging the specificity of context, LLMZ+ guarantees that all exchanges between external users and the LLM conform to predefined use cases and operational boundaries. Our approach streamlines the security framework, enhances its long-term resilience, and reduces the resources required for sustaining LLM information security. Our empirical evaluation demonstrates that LLMZ+ provides strong resilience against the most common jailbreak prompts. At the same time, legitimate business communications are not disrupted, and authorized traffic flows seamlessly between users and the agentic LLM. We measure the effectiveness of approach using false positive and false negative rates, both of which can be reduced to 0 in our experimental setting.

Key Contributions

LLMZ+ framework: a contextual prompt whitelisting mechanism for agentic LLMs inspired by DMZ firewall principles (allow-list rather than block-list)
Empirical demonstration that both false positive and false negative rates can be reduced to 0 against common jailbreak prompts in their experimental setting
Critique of detection-based defenses (signature/heuristic approaches) and proposal of a maintenance-lighter whitelist alternative for long-term LLM security resilience

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

agentic ai systemsllm-based agents with api tool access

Read PDF arXiv DOI

LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Secure and Efficient Access Control for Computer-Use Agents via Context Space

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection