The Vulnerability of LLM Rankers to Prompt Injection Attacks
Yu Yin 1, Shuai Wang 1, Bevan Koopman 1, Guido Zuccon 1,2
Published on arXiv
2602.16752
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Encoder-decoder LLM architectures exhibit strong inherent resilience to jailbreak prompt injection in ranking tasks, a finding not previously characterized in the literature
Decision Objective Hijacking / Decision Criteria Hijacking
Novel technique introduced
Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.
Key Contributions
- Comprehensive empirical study of jailbreak prompt injection attacks on LLM rankers across three ranking paradigms (pairwise, listwise, setwise) and diverse LLM families
- Two formalized injection variants: decision objective hijacking and decision criteria hijacking, with dual evaluation via ASR and nDCG@10
- Characterization of vulnerability boundary conditions, showing encoder-decoder architectures exhibit strong inherent resilience while decoder-only models are highly susceptible