attack 2025

Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries

Wenqiang Wang ^1,2, Yan Xiao ^1,2, Hao Lin ^1,2, Yangshijie Zhang ³, Xiaochun Cao ^1,2

¹ Sun Yat-sen University

² Peng Cheng Laboratory

³ Lanzhou University

0 citations

Published on arXiv

2508.10039

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

CEMA achieves significant attack success with as few as 100 queries across multi-task models spanning 2, 3, and 6 tasks, and successfully attacks commercial APIs and LLMs without mimicking victim model internals.

CEMA (Cluster and Ensemble Multi-task Text Adversarial Attack)

Novel technique introduced

Current multi-task adversarial text attacks rely on abundant access to shared internal features and numerous queries, often limited to a single task type. As a result, these attacks are less effective against practical scenarios involving black-box feedback APIs, limited queries, or multiple task types. To bridge this gap, we propose \textbf{C}luster and \textbf{E}nsemble \textbf{M}ulti-task Text Adversarial \textbf{A}ttack (\textbf{CEMA}), an effective black-box attack that exploits the transferability of adversarial texts across different tasks. CEMA simplifies complex multi-task scenarios by using a \textit{deep-level substitute model} trained in a \textit{plug-and-play} manner for text classification, enabling attacks without mimicking the victim model. This approach requires only a few queries for training, converting multi-task attacks into classification attacks and allowing attacks across various tasks. CEMA generates multiple adversarial candidates using different text classification methods and selects the one that most effectively attacks substitute models. In experiments involving multi-task models with two, three, or six tasks--spanning classification, translation, summarization, and text-to-image generation--CEMA demonstrates significant attack success with as few as 100 queries. Furthermore, CEMA can target commercial APIs (e.g., Baidu and Google Translate), large language models (e.g., ChatGPT 4o), and image-generation models (e.g., Stable Diffusion V2), showcasing its versatility and effectiveness in real-world applications.

Key Contributions

CEMA framework that converts multi-task adversarial attack problems into text classification attacks using a plug-and-play deep-level substitute model, requiring only ~100 queries
Adversarial candidate generation and ensemble selection strategy that exploits cross-task transferability of adversarial texts
Demonstrated attacks against diverse real-world targets including commercial APIs (Baidu/Google Translate), LLMs (ChatGPT 4o), and image-generation models (Stable Diffusion V2)

🛡️ Threat Analysis

Input Manipulation Attack

CEMA generates adversarial text examples (word-level perturbations) at inference time to cause incorrect outputs across diverse task types — this is the core ML01 threat of input manipulation / adversarial examples applied to NLP. The attack exploits transferability via a deep-level substitute model, targeting both traditional NLP models and commercial APIs including LLMs and text-to-image generators.

Details

Domains

nlpmultimodal

Model Types

llmtransformerdiffusion

Threat Tags

black_boxinference_timetargeteddigital

Applications

text classificationneural machine translationtext summarizationtext-to-image generation

Read PDF arXiv

Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Adversarial Attacks against Neural Ranking Models via In-Context Learning

CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion

Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent

Semantics-Preserving Evasion of LLM Vulnerability Detectors

LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems

HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors

Text Adversarial Attacks with Dynamic Outputs