defense 2025

PromptLocate: Localizing Prompt Injection Attacks

Yuqi Jia 1, Yupei Liu 2, Zedian Shao 1, Jinyuan Jia 2, Neil Zhenqiang Gong 1

8 citations · 1 influential · 46 references · arXiv

α

Published on arXiv

2510.12252

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PromptLocate accurately localizes injected prompts across eight existing and eight adaptive prompt injection attacks

PromptLocate

Novel technique introduced


Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.


Key Contributions

  • PromptLocate: the first method to localize injected prompts within contaminated LLM input data, enabling post-attack forensic analysis and data recovery
  • A three-step pipeline: semantic segmentation of contaminated input, identification of segments containing injected instructions, and pinpointing of injected data segments
  • Evaluation across eight existing and eight adaptive prompt injection attacks demonstrating accurate localization

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
llm-integrated systemsrag pipelinesprompt injection forensics