AttnTrace: Attention-based Context Traceback for Long-Context LLMs
Yanting Wang , Runpeng Geng , Ying Chen , Jinyuan Jia
Published on arXiv
2508.03793
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
AttnTrace outperforms SOTA context traceback methods (e.g., TracLLM) in both accuracy and efficiency, reducing traceback time from hundreds of seconds to near real-time, while improving prompt injection detection under long-context settings
AttnTrace
Novel technique introduced
Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost, e.g., it takes TracLLM hundreds of seconds to perform traceback for a single response-context pair. In this work, we propose AttnTrace, a new context traceback method based on the attention weights produced by an LLM for a prompt. To effectively utilize attention weights, we introduce two techniques designed to enhance the effectiveness of AttnTrace, and we provide theoretical insights for our design choice. We also perform a systematic evaluation for AttnTrace. The results demonstrate that AttnTrace is more accurate and efficient than existing state-of-the-art context traceback methods. We also show that AttnTrace can improve state-of-the-art methods in detecting prompt injection under long contexts through the attribution-before-detection paradigm. As a real-world application, we demonstrate that AttnTrace can effectively pinpoint injected instructions in a paper designed to manipulate LLM-generated reviews. The code is at https://github.com/Wang-Yanting/AttnTrace.
Key Contributions
- AttnTrace: an attention-weight-based context traceback method that identifies which retrieved context segments most influenced an LLM response, more accurately and efficiently than SOTA methods like TracLLM
- Two techniques for effective attention weight utilization with theoretical justification, enabling tractable traceback for long contexts
- Attribution-before-detection paradigm that improves SOTA prompt injection detection in long-context LLMs, with demonstrated forensic application to real-world adversarial papers targeting LLM reviewers