defense 2025

Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment

Tong Zhang 1, Kuofeng Gao 2, Jiawang Bai 2, Leo Yu Zhang 3, Xin Yin 1, Zonghui Wang 1, Shouling Ji 1, Wenzhi Chen 1

1 citations · 48 references · EMNLP

α

Published on arXiv

2509.18717

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

OTCCLIP reduces attack success rates for both targeted data poisoning and backdoor attacks while significantly improving CLIP's zero-shot and linear probing performance on poisoned datasets compared to prior defenses.

OTCCLIP

Novel technique introduced


Recent studies have shown that Contrastive Language-Image Pre-training (CLIP) models are threatened by targeted data poisoning and backdoor attacks due to massive training image-caption pairs crawled from the Internet. Previous defense methods correct poisoned image-caption pairs by matching a new caption for each image. However, the matching process relies solely on the global representations of images and captions, overlooking fine-grained features of visual and textual features. It may introduce incorrect image-caption pairs and harm the CLIP pre-training. To address their limitations, we propose an Optimal Transport-based framework to reconstruct image-caption pairs, named OTCCLIP. We propose a new optimal transport-based distance measure between fine-grained visual and textual feature sets and re-assign new captions based on the proposed optimal transport distance. Additionally, to further reduce the negative impact of mismatched pairs, we encourage the inter- and intra-modality fine-grained alignment by employing optimal transport-based objective functions. Our experiments demonstrate that OTCCLIP can successfully decrease the attack success rates of poisoning attacks. Also, compared to previous methods, OTCCLIP significantly improves CLIP's zero-shot and linear probing performance trained on poisoned datasets.


Key Contributions

  • Optimal transport-based fine-grained distance measure between image patch and caption token feature sets for detecting and correcting poisoned image-caption pairs during CLIP pre-training
  • Caption re-assignment mechanism using OT transport matrices as weights to capture patch-token region correspondences, improving robustness over global-feature-only matching
  • Inter- and intra-modality fine-grained alignment objectives using OT to reduce harm from residual mismatched pairs after correction

🛡️ Threat Analysis

Data Poisoning Attack

The paper explicitly targets targeted data poisoning attacks (TDPAs) on CLIP pre-training, where adversarial image-caption pairs are injected into massive crawled training datasets. The defense disrupts these poisoned pairs by remapping captions via optimal transport distance.

Model Poisoning

The paper also explicitly defends against backdoor attacks (BAs) on CLIP, where trigger insertion into as little as 0.01% of pre-training data induces targeted misclassification — the canonical backdoor/trojan threat model.


Details

Domains
visionnlpmultimodal
Model Types
multimodaltransformer
Threat Tags
training_timetargeted
Applications
vision-language pre-trainingzero-shot image classificationimage-text representation learning