defense 2025

Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts

Steven Peh

1 citations · 6 references · arXiv

α

Published on arXiv

2511.19727

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Prompt Fencing with simulated fence awareness reduces prompt injection attack success from 86.7% to 0% across 300 test cases.

Prompt Fencing

Novel technique introduced


Large Language Models (LLMs) remain vulnerable to prompt injection attacks, representing the most significant security threat in production deployments. We present Prompt Fencing, a novel architectural approach that applies cryptographic authentication and data architecture principles to establish explicit security boundaries within LLM prompts. Our approach decorates prompt segments with cryptographically signed metadata including trust ratings and content types, enabling LLMs to distinguish between trusted instructions and untrusted content. While current LLMs lack native fence awareness, we demonstrate that simulated awareness through prompt instructions achieved complete prevention of injection attacks in our experiments, reducing success rates from 86.7% (260/300 successful attacks) to 0% (0/300 successful attacks) across 300 test cases with two leading LLM providers. We implement a proof-of-concept fence generation and verification pipeline with a total overhead of 0.224 seconds (0.130s for fence generation, 0.094s for validation) across 100 samples. Our approach is platform-agnostic and can be incrementally deployed as a security layer above existing LLM infrastructure, with the expectation that future models will be trained with native fence awareness for optimal security.


Key Contributions

  • Prompt Fencing architectural approach that decorates prompt segments with cryptographically signed metadata (trust ratings, content types) to establish explicit security boundaries between trusted and untrusted content
  • Empirical demonstration that simulated fence awareness reduces prompt injection success rate from 86.7% (260/300) to 0% (0/300) across 300 test cases with two leading LLM providers
  • Proof-of-concept fence generation and verification pipeline with 0.224s total overhead, designed as a platform-agnostic security layer above existing LLM infrastructure

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
300 test cases (two undisclosed LLM providers)
Applications
llm production deploymentsprompt injection defense