One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.

Key Contributions

Minimal query-aware adversarial attack on neural re-rankers using single-word insertion/substitution (query center) achieving up to 91% success rate
Gradient-guided white-box variant that identifies influential insertion points for targeted rank promotion
New diagnostic metrics revealing a 'Goldilocks zone' where mid-ranked documents are most vulnerable to single-word perturbations

🛡️ Threat Analysis

Input Manipulation Attack

Proposes inference-time adversarial input attacks — inserting or substituting a single word in documents to manipulate neural ranking model outputs. Both heuristic and gradient-guided (white-box) variants craft minimal perturbations causing target documents to be incorrectly promoted in ranked results.

Details

Domains

nlp

Model Types

transformer

Threat Tags

white_boxblack_boxinference_timetargeteddigital

Datasets

TREC-DL 2019TREC-DL 2020

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Semantics-Preserving Evasion of LLM Vulnerability Detectors

HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors

destroR: Attacking Transfer Models with Obfuscous Examples to Discard Perplexity

Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm

StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Rerouting LLM Routers

Adversarial Attacks against Neural Ranking Models via In-Context Learning