attack 2026

BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning

Siyuan Liang 1, Yongcheng Jing 1, Yingjie Wang 1, Jiaxing Huang 1, Ee-chien Chang 1, Dacheng Tao 2

0 citations · 75 references · arXiv (Cornell University)

α

Published on arXiv

2602.17168

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Achieves 99.99% attack success rate at only 0.3% poisoning, surpassing baselines by 11.4 points, with ASR above 99.90% across 19 defenses and 65.03% physical attack success

BadCLIP++

Novel technique introduced


Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence. Existing methods often fail under strong detection or continuous fine-tuning, largely due to (1) cross-modal inconsistency that exposes trigger patterns and (2) gradient dilution at low poisoning rates that accelerates backdoor forgetting. These coupled causes remain insufficiently modeled and addressed. We propose BadCLIP++, a unified framework that tackles both challenges. For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions, preserving clean-data statistics while producing compact trigger distributions. We further apply target-aligned subset selection to strengthen signals at low injection rates. For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment, and stabilize model parameters through curvature control and elastic weight consolidation, maintaining solutions within a low-curvature wide basin resistant to fine-tuning. We also provide the first theoretical analysis showing that, within a trust region, gradients from clean fine-tuning and backdoor objectives are co-directional, yielding a non-increasing upper bound on attack success degradation. Experiments demonstrate that with only 0.3% poisoning, BadCLIP++ achieves 99.99% attack success rate (ASR) in digital settings, surpassing baselines by 11.4 points. Across nineteen defenses, ASR remains above 99.90% with less than 0.8% drop in clean accuracy. The method further attains 65.03% success in physical attacks and shows robustness against watermark removal defenses.


Key Contributions

  • Semantic-fusion QR micro-trigger that embeds imperceptible backdoor patterns near task-relevant semantic regions, maintaining clean-data statistics while achieving compact trigger distributions
  • Persistence mechanisms (radius shrinkage, centroid alignment, curvature control, elastic weight consolidation) that anchor backdoor behavior in wide, low-curvature loss basins resistant to fine-tuning
  • First theoretical proof of co-directionality between clean fine-tuning and backdoor gradients within a trust region, yielding a non-increasing upper bound on attack success degradation under defense

🛡️ Threat Analysis

Model Poisoning

Core contribution is a backdoor injection framework that embeds hidden trigger-activated malicious behavior into CLIP via poisoned training data (QR micro-triggers); model behaves normally until triggered — textbook ML10.


Details

Domains
multimodalvision
Model Types
vlmmultimodal
Threat Tags
training_timetargeteddigitalphysical
Datasets
CC3MMSCOCOFlickr30kCIFAR-10
Applications
multimodal contrastive learningzero-shot image classificationimage-text retrieval