attack arXiv Mar 11, 2026 · 26d ago
Yuzhi Liang, Shiliang Xiao, Jingsong Wei et al. · Guangdong University of Foreign Studies
Hard-label black-box text attack using Multi-Armed Bandits to find pivot word groups that efficiently flip classifier labels
Input Manipulation Attack nlp
Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.
transformer llm Guangdong University of Foreign Studies