attack 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang 1, Chaojian Yu 1, Ziming Hong 2, Yuanjie Shao 1, Qinmu Peng 1, Tongliang Liu 2, Xinge You 1

0 citations

α

Published on arXiv

2604.05809

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Achieves adjustable attack success rates across CIR and VQA tasks using stealthy text triggers in realistic settings

TGB

Novel technique introduced


Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are impractical since visually embedded triggers rarely occur in real-world data. To overcome this limitation, we propose a novel Text-Guided Backdoor (TGB) attack on multimodal pretrained models, where commonly occurring words in textual descriptions serve as backdoor triggers, significantly improving stealthiness and practicality. Furthermore, we introduce visual adversarial perturbations on poisoned samples to modulate the model's learning of textual triggers, enabling a controllable and adjustable TGB attack. Extensive experiments on downstream tasks built upon multimodal pretrained models, including Composed Image Retrieval (CIR) and Visual Question Answering (VQA), demonstrate that TGB achieves practicality and stealthiness with adjustable attack success rates across diverse realistic settings, revealing critical security vulnerabilities in multimodal pretrained models.


Key Contributions

  • Novel text-guided backdoor attack using common words as triggers instead of visual patterns, improving stealthiness
  • Adjustable attack mechanism using adversarial perturbations to control backdoor strength
  • Demonstrates practical backdoor attacks on downstream tasks (CIR, VQA) built on multimodal pretrained models

🛡️ Threat Analysis

Data Poisoning Attack

Uses data poisoning as the attack vector — corrupts training data with poisoned image-text pairs to inject the backdoor during pretraining.

Model Poisoning

Proposes a backdoor attack that embeds hidden malicious behavior in multimodal pretrained models, triggered by specific textual phrases during inference.


Details

Domains
multimodalvisionnlp
Model Types
multimodaltransformer
Threat Tags
training_timetargeteddigital
Applications
composed image retrievalvisual question answeringmultimodal pretraining