defense 2026

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Mingrui Liu , Sixiao Zhang , Cheng Long , Kwok-Yan Lam

Nanyang Technological University

0 citations · 48 references · arXiv (Cornell University)

Published on arXiv

2602.01795

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

RedVisor outperforms state-of-the-art defenses in detection accuracy and throughput while incurring negligible utility loss on benign inputs.

RedVisor

Novel technique introduced

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: prevention-based fine-tuning often degrades general utility via the "alignment tax", while detection-based filtering incurs prohibitive latency and memory costs. To bridge this gap, we propose RedVisor, a unified framework that synthesizes the explainability of detection systems with the seamless integration of prevention strategies. To the best of our knowledge, RedVisor is the first approach to leverage fine-grained reasoning paths to simultaneously detect attacks and guide the model's safe response. We implement this via a lightweight, removable adapter positioned atop the frozen backbone. This adapter serves a dual function: it first generates an explainable analysis that precisely localizes the injection and articulates the threat, which then explicitly conditions the model to reject the malicious command. Uniquely, the adapter is active only during this reasoning phase and is effectively muted during the subsequent response generation. This architecture yields two distinct advantages: (1) it mathematically preserves the backbone's original utility on benign inputs; and (2) it enables a novel KV Cache Reuse strategy, eliminating the redundant prefill computation inherent to decoupled pipelines. We further pioneer the integration of this defense into the vLLM serving engine with custom kernels. Experiments demonstrate that RedVisor outperforms state-of-the-art defenses in detection accuracy and throughput while incurring negligible utility loss.

Key Contributions

First prompt injection defense leveraging fine-grained reasoning paths to simultaneously detect attacks, localize the injection, and guide safe response generation
Lightweight removable adapter architecture that mathematically preserves backbone utility on benign inputs, eliminating the alignment-tax trade-off of fine-tuning-based defenses
Novel Zero-Copy KV Cache Reuse strategy integrated into the vLLM serving engine via custom kernels, eliminating redundant prefill computation to achieve high-throughput defense

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

llm serving systemsretrieval-augmented generation (rag)llm agents processing external contexts

Read PDF arXiv DOI

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance

Proactive defense against LLM Jailbreak

Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

From static to adaptive: immune memory-based jailbreak detection for large language models

Knowing When Not to Answer: Lightweight KB-Aligned OOD Detection for Safe RAG

Defend LLMs Through Self-Consciousness

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG