Observation-Free Attacks on Online Learning to Rank

Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.

Key Contributions

First observation-free attack framework for OLTR algorithms under cascade and position-based click feedback models
Two concrete attack strategies — CascadeOFA targeting CascadeUCB1 and PBMOFA targeting PBM-UCB — with theoretical guarantees of O(log T) required manipulations
Proof that both attacks promote target items in top-K recommendations for T-o(T) rounds with high probability while inducing linear regret in the victim algorithm, validated on MovieLens

🛡️ Threat Analysis

Model Skewing

The attacks (CascadeOFA, PBMOFA) exploit the feedback loop of online bandit-style learning algorithms by manipulating click/reward signals over T rounds, gradually skewing the OLTR algorithm's ranking behavior — directly matching ML08's 'feedback loop attacks' and 'reward hacking over time in RL-like systems' via temporal, incremental poisoning.