defense 2025

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong , Qianhao Miao , Yanjiao Chen , Jiangyi Deng , Yushi Cheng , Wenyuan Xu

Zhejiang University

2 citations · 93 references · arXiv

Published on arXiv

2512.08417

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Rennervate outperforms 15 commercial and academic IPI defense methods with high precision on 5 LLMs and 6 datasets, while remaining transferable to unseen attacks and robust against adaptive adversaries.

Rennervate

Novel technique introduced

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.

Key Contributions

Rennervate: an attention-feature-based defense framework that detects and sanitizes indirect prompt injection at fine-grained token level
2-step attentive pooling mechanism that aggregates attention heads and response tokens for IPI detection and sanitization
FIPI: a fine-grained open-source IPI dataset to support future research; outperforms 15 commercial and academic IPI defenses across 5 LLMs and 6 datasets

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

FIPI

Applications

llm-powered applicationsweb agents

Read PDF arXiv DOI

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

Securing AI Agents Against Prompt Injection Attacks

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

ExpGuard: LLM Content Moderation in Specialized Domains

Auto-Tuning Safety Guardrails for Black-Box Large Language Models

SecInfer: Preventing Prompt Injection via Inference-time Scaling