Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment
Tong Zhang 1, Kuofeng Gao 2, Jiawang Bai 2, Leo Yu Zhang 3, Xin Yin 1, Zonghui Wang 1, Shouling Ji 1, Wenzhi Chen 1
Published on arXiv
2509.18717
Data Poisoning Attack
OWASP ML Top 10 — ML02
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
OTCCLIP reduces attack success rates for both targeted data poisoning and backdoor attacks while significantly improving CLIP's zero-shot and linear probing performance on poisoned datasets compared to prior defenses.
OTCCLIP
Novel technique introduced
Recent studies have shown that Contrastive Language-Image Pre-training (CLIP) models are threatened by targeted data poisoning and backdoor attacks due to massive training image-caption pairs crawled from the Internet. Previous defense methods correct poisoned image-caption pairs by matching a new caption for each image. However, the matching process relies solely on the global representations of images and captions, overlooking fine-grained features of visual and textual features. It may introduce incorrect image-caption pairs and harm the CLIP pre-training. To address their limitations, we propose an Optimal Transport-based framework to reconstruct image-caption pairs, named OTCCLIP. We propose a new optimal transport-based distance measure between fine-grained visual and textual feature sets and re-assign new captions based on the proposed optimal transport distance. Additionally, to further reduce the negative impact of mismatched pairs, we encourage the inter- and intra-modality fine-grained alignment by employing optimal transport-based objective functions. Our experiments demonstrate that OTCCLIP can successfully decrease the attack success rates of poisoning attacks. Also, compared to previous methods, OTCCLIP significantly improves CLIP's zero-shot and linear probing performance trained on poisoned datasets.
Key Contributions
- Optimal transport-based fine-grained distance measure between image patch and caption token feature sets for detecting and correcting poisoned image-caption pairs during CLIP pre-training
- Caption re-assignment mechanism using OT transport matrices as weights to capture patch-token region correspondences, improving robustness over global-feature-only matching
- Inter- and intra-modality fine-grained alignment objectives using OT to reduce harm from residual mismatched pairs after correction
🛡️ Threat Analysis
The paper explicitly targets targeted data poisoning attacks (TDPAs) on CLIP pre-training, where adversarial image-caption pairs are injected into massive crawled training datasets. The defense disrupts these poisoned pairs by remapping captions via optimal transport distance.
The paper also explicitly defends against backdoor attacks (BAs) on CLIP, where trigger insertion into as little as 0.01% of pre-training data induces targeted misclassification — the canonical backdoor/trojan threat model.