benchmark 2026

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin 1, Shuai Wang 1, Bevan Koopman 1, Guido Zuccon 1,2

0 citations · 38 references · arXiv (Cornell University)

α

Published on arXiv

2602.16752

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Encoder-decoder LLM architectures exhibit strong inherent resilience to jailbreak prompt injection in ranking tasks, a finding not previously characterized in the literature

Decision Objective Hijacking / Decision Criteria Hijacking

Novel technique introduced


Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.


Key Contributions

  • Comprehensive empirical study of jailbreak prompt injection attacks on LLM rankers across three ranking paradigms (pairwise, listwise, setwise) and diverse LLM families
  • Two formalized injection variants: decision objective hijacking and decision criteria hijacking, with dual evaluation via ASR and nDCG@10
  • Characterization of vulnerability boundary conditions, showing encoder-decoder architectures exhibit strong inherent resilience while decoder-only models are highly susceptible

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
TREC
Applications
information retrievaldocument rankingllm-based search pipelines