defense 2025

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng , Yanting Wang , Chenlong Yin , Minhao Cheng , Ying Chen , Jinyuan Jia

3 citations · 1 influential · arXiv

α

Published on arXiv

2511.10720

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PISanitizer successfully prevents prompt injection in long-context LLMs while maintaining utility and outperforming existing defenses, including against optimization-based adaptive attacks.

PISanitizer

Novel technique introduced


Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an attacker-desired output. Existing prompt injection defenses are designed for short contexts. When extended to long-context scenarios, they have limited effectiveness. The reason is that an injected instruction constitutes only a very small portion of a long context, making the defense very challenging. In this work, we propose PISanitizer, which first pinpoints and sanitizes potential injected tokens (if any) in a context before letting a backend LLM generate a response, thereby eliminating the influence of the injected instruction. To sanitize injected tokens, PISanitizer builds on two observations: (1) prompt injection attacks essentially craft an instruction that compels an LLM to follow it, and (2) LLMs intrinsically leverage the attention mechanism to focus on crucial input tokens for output generation. Guided by these two observations, we first intentionally let an LLM follow arbitrary instructions in a context and then sanitize tokens receiving high attention that drive the instruction-following behavior of the LLM. By design, PISanitizer presents a dilemma for an attacker: the more effectively an injected instruction compels an LLM to follow it, the more likely it is to be sanitized by PISanitizer. Our extensive evaluation shows that PISanitizer can successfully prevent prompt injection, maintain utility, outperform existing defenses, is efficient, and is robust to optimization-based and strong adaptive attacks. The code is available at https://github.com/sleeepeer/PISanitizer.


Key Contributions

  • Attention-based token sanitization that identifies injected instructions by exploiting the LLM's own instruction-following attention signals
  • Creates an adversarial dilemma: the more compelling the injected instruction, the higher its attention scores and the more likely it is to be sanitized
  • Demonstrates effectiveness in long-context scenarios where existing defenses fail, with robustness to optimization-based and adaptive attacks

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
long-context llm systemsrag systemsllm document processing