defense 2025

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng , Yanting Wang , Chenlong Yin , Minhao Cheng , Ying Chen , Jinyuan Jia

The Pennsylvania State University

3 citations · 1 influential · arXiv

Published on arXiv

2511.10720

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PISanitizer successfully prevents prompt injection in long-context LLMs while maintaining utility and outperforming existing defenses, including against optimization-based adaptive attacks.

PISanitizer

Novel technique introduced

Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an attacker-desired output. Existing prompt injection defenses are designed for short contexts. When extended to long-context scenarios, they have limited effectiveness. The reason is that an injected instruction constitutes only a very small portion of a long context, making the defense very challenging. In this work, we propose PISanitizer, which first pinpoints and sanitizes potential injected tokens (if any) in a context before letting a backend LLM generate a response, thereby eliminating the influence of the injected instruction. To sanitize injected tokens, PISanitizer builds on two observations: (1) prompt injection attacks essentially craft an instruction that compels an LLM to follow it, and (2) LLMs intrinsically leverage the attention mechanism to focus on crucial input tokens for output generation. Guided by these two observations, we first intentionally let an LLM follow arbitrary instructions in a context and then sanitize tokens receiving high attention that drive the instruction-following behavior of the LLM. By design, PISanitizer presents a dilemma for an attacker: the more effectively an injected instruction compels an LLM to follow it, the more likely it is to be sanitized by PISanitizer. Our extensive evaluation shows that PISanitizer can successfully prevent prompt injection, maintain utility, outperform existing defenses, is efficient, and is robust to optimization-based and strong adaptive attacks. The code is available at https://github.com/sleeepeer/PISanitizer.

Key Contributions

Attention-based token sanitization that identifies injected instructions by exploiting the LLM's own instruction-following attention signals
Creates an adversarial dilemma: the more compelling the injected instruction, the higher its attention scores and the more likely it is to be sanitized
Demonstrates effectiveness in long-context scenarios where existing defenses fail, with robustness to optimization-based and adaptive attacks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

long-context llm systemsrag systemsllm document processing

Read PDF arXiv DOI Code

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

Securing AI Agents Against Prompt Injection Attacks

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

ExpGuard: LLM Content Moderation in Specialized Domains

Auto-Tuning Safety Guardrails for Black-Box Large Language Models

SecInfer: Preventing Prompt Injection via Inference-time Scaling