attack 2026

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Yuzhi Liang , Shiliang Xiao , Jingsong Wei , Qiliang Lin , Xia Li

0 citations

α

Published on arXiv

2603.10842

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

PivotAttack consistently outperforms state-of-the-art hard-label black-box text attacks in both Attack Success Rate and query efficiency across traditional models and LLMs, with strong results even against robust fine-tuned LLMs.

PivotAttack

Novel technique introduced


Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.


Key Contributions

  • Novel 'inside-out' attack strategy that identifies Pivot Sets (critical multi-token anchors) and perturbs them to efficiently cross the decision boundary, avoiding the query-expensive 'outside-in' refinement of prior work
  • Formulation of Pivot Set identification as a Multi-Armed Bandit (KL-LUCB) problem, capturing inter-word dependencies rather than scoring tokens independently
  • Demonstrated effectiveness against both traditional NLP models and fine-tuned/zero-shot LLMs, outperforming SOTA baselines in attack success rate and query efficiency

🛡️ Threat Analysis

Input Manipulation Attack

Proposes a novel adversarial example attack on NLP text classifiers in the hard-label black-box setting — the attacker crafts word substitutions to cause misclassification at inference time, directly targeting model input integrity across both traditional NLP models and LLMs used as classifiers.


Details

Domains
nlp
Model Types
transformerllm
Threat Tags
black_boxinference_timetargeteddigital
Applications
text classification