RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking
Jiawei Liu 1, Zhuo Chen 1, Rui Zhu 2, Miaokun Chen 1, Yuyang Gong 1, Wei Lu 1, Xiaofeng Wang 3
Published on arXiv
2512.23307
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
RobustMask certifies over 20% of candidate documents within top-10 ranking positions against adversarial perturbations affecting up to 30% of document content.
RobustMask
Novel technique introduced
Neural ranking models have achieved remarkable progress and are now widely deployed in real-world applications such as Retrieval-Augmented Generation (RAG). However, like other neural architectures, they remain vulnerable to adversarial manipulations: subtle character-, word-, or phrase-level perturbations can poison retrieval results and artificially promote targeted candidates, undermining the integrity of search engines and downstream systems. Existing defenses either rely on heuristics with poor generalization or on certified methods that assume overly strong adversarial knowledge, limiting their practical use. To address these challenges, we propose RobustMask, a novel defense that combines the context-prediction capability of pretrained language models with a randomized masking-based smoothing mechanism. Our approach strengthens neural ranking models against adversarial perturbations at the character, word, and phrase levels. Leveraging both the pairwise comparison ability of ranking models and probabilistic statistical analysis, we provide a theoretical proof of RobustMask's certified top-K robustness. Extensive experiments further demonstrate that RobustMask successfully certifies over 20% of candidate documents within the top-10 ranking positions against adversarial perturbations affecting up to 30% of their content. These results highlight the effectiveness of RobustMask in enhancing the adversarial robustness of neural ranking models, marking a significant step toward providing stronger security guarantees for real-world retrieval systems.
Key Contributions
- RobustMask: a randomized masking-based smoothing defense that uses pretrained LM context prediction to reconstruct masked tokens before ranking
- Theoretical proof of certified top-K robustness for neural ranking models under character-, word-, and phrase-level adversarial perturbations
- Empirical demonstration certifying over 20% of candidate documents in top-10 positions against perturbations affecting up to 30% of document content
🛡️ Threat Analysis
The paper defends against inference-time adversarial perturbations (character, word, phrase level) that manipulate neural ranking model outputs to artificially promote targeted documents — a classic input manipulation / adversarial evasion attack. RobustMask provides certified top-K robustness via randomized masking smoothing.