attack 2026

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

Tanmay Karmakar , Sourav Saha , Debapriyo Majumdar , Surjyanee Halder

0 citations · 15 references · arXiv

α

Published on arXiv

2601.20283

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Single-word attacks achieve up to 91% success rate on BERT and monoT5 re-rankers while modifying fewer than two tokens per document on average, outperforming PRADA in edit efficiency under comparable white-box settings.

Query Center Attack

Novel technique introduced


Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.


Key Contributions

  • Minimal query-aware adversarial attack on neural re-rankers using single-word insertion/substitution (query center) achieving up to 91% success rate
  • Gradient-guided white-box variant that identifies influential insertion points for targeted rank promotion
  • New diagnostic metrics revealing a 'Goldilocks zone' where mid-ranked documents are most vulnerable to single-word perturbations

🛡️ Threat Analysis

Input Manipulation Attack

Proposes inference-time adversarial input attacks — inserting or substituting a single word in documents to manipulate neural ranking model outputs. Both heuristic and gradient-guided (white-box) variants craft minimal perturbations causing target documents to be incorrectly promoted in ranked results.


Details

Domains
nlp
Model Types
transformer
Threat Tags
white_boxblack_boxinference_timetargeteddigital
Datasets
TREC-DL 2019TREC-DL 2020
Applications
neural text rankinginformation retrievaldocument re-ranking