defense 2025

Provably Robust Adaptation for Language-Empowered Foundation Models

Yuni Lai 1, Xiaoyu Xue 1, Linghui Shen 1, Yulun Wu 2, Gaolei Li 3, Song Guo 4, Kai Zhou 1, Bin Xiao 1

1 citations · 51 references · arXiv

α

Published on arXiv

2510.08659

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

LeFCert achieves 96% certified accuracy on CIFAR-FS (5-way 10-shot) with up to 3 arbitrarily poisoned support samples, outperforming FCert by 33%.

LeFCert

Novel technique introduced


Language-empowered foundation models (LeFMs), such as CLIP and GraphCLIP, have transformed multimodal learning by aligning visual (or graph) features with textual representations, enabling powerful downstream capabilities like few-shot learning. However, the reliance on small, task-specific support datasets collected in open environments exposes these models to poisoning attacks, where adversaries manipulate the support samples to degrade performance. Existing defenses rely on empirical strategies, which lack formal guarantees and remain vulnerable to unseen and adaptive attacks. Certified robustness offers provable guarantees but has been largely unexplored for few-shot classifiers based on LeFMs. This study seeks to fill these critical gaps by proposing the first provably robust few-shot classifier that is tailored for LeFMs. We term our model Language-empowered Few-shot Certification (\textbf{LeFCert}). It integrates both textual and feature embeddings with an adaptive blending mechanism. To achieve provable robustness, we propose a twofold trimmed mean prototype and derive provable upper and lower bounds for classification scores, enabling certification under worst-case poisoning scenarios. To further enhance the performance, we extend LeFCert with two variants by considering a more realistic and tighter attack budget: LeFCert-L incorporates randomized smoothing to provide Lipschitz continuity and derive robustness under dual budget constraints, and LeFCert-C provides collective certification for scenarios where attackers distribute a shared poisoning budget across multiple samples. Experiments demonstrate that LeFCert achieves state-of-the-art performance, significantly improving both clean and certified accuracy compared to existing baselines. Despite its advanced robustness mechanisms, LeFCert is computationally efficient, making it practical for real-world applications.


Key Contributions

  • First provably robust few-shot classifier for language-empowered foundation models (LeFMs), integrating textual and visual/graph embeddings via an adaptive blending mechanism with twofold trimmed mean prototypes
  • LeFCert-L extends the base model with randomized smoothing for Lipschitz continuity, certifying robustness under dual budget constraints (per-sample and aggregate)
  • LeFCert-C provides collective certification when the poisoning budget is distributed across multiple support samples

🛡️ Threat Analysis

Data Poisoning Attack

The core threat is an adversary manipulating (poisoning) the few-shot support/adaptation samples to degrade model performance — classic data poisoning. LeFCert defends against this with provable worst-case guarantees via twofold trimmed mean prototypes and collective certification.


Details

Domains
visiongraphmultimodal
Model Types
vlmtransformermultimodal
Threat Tags
training_timeuntargetedwhite_box
Datasets
CIFAR-FSTiered-ImageNet
Applications
few-shot image classificationfew-shot graph classification