defense 2026

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Song Xia 1, Meiwen Ding 1, Chenqi Kong 1, Wenhan Yang 2, Xudong Jiang 1

0 citations · 57 references · arXiv

α

Published on arXiv

2601.16200

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

FS framework provides certified feature-space robustness guarantees for MLLMs and consistently improves robust task-oriented performance across diverse applications without requiring model retraining.

Feature-space Smoothing (FS) / Gaussian Smoothness Booster (GSB)

Novel technique introduced


Multimodal large language models (MLLMs) exhibit strong capabilities across diverse applications, yet remain vulnerable to adversarial perturbations that distort their feature representations and induce erroneous predictions. To address this vulnerability, we propose Feature-space Smoothing (FS), a general framework that provides certified robustness guarantees at the feature representation level of MLLMs. We theoretically prove that FS converts a given feature extractor into a smoothed variant that is guaranteed a certified lower bound on the cosine similarity between clean and adversarial features under $\ell_2$-bounded perturbations. Moreover, we establish that the value of this Feature Cosine Similarity Bound (FCSB) is determined by the intrinsic Gaussian robustness score of the given encoder. Building on this insight, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module that enhances the Gaussian robustness score of pretrained MLLMs, thereby strengthening the robustness guaranteed by FS, without requiring additional MLLM retraining. Extensive experiments demonstrate that applying the FS to various MLLMs yields strong certified feature-space robustness and consistently leads to robust task-oriented performance across diverse applications.


Key Contributions

  • Feature-space Smoothing (FS) framework that provides certified lower bounds on cosine similarity between clean and adversarial feature representations under ℓ2-bounded perturbations
  • Theoretical proof linking the Feature Cosine Similarity Bound (FCSB) to the intrinsic Gaussian robustness score of the encoder
  • Gaussian Smoothness Booster (GSB), a plug-and-play module that strengthens certified robustness of pretrained MLLMs without requiring retraining

🛡️ Threat Analysis

Input Manipulation Attack

Directly proposes a certified defense (Feature-space Smoothing) against adversarial input perturbations that distort feature representations and cause erroneous predictions in MLLMs — the canonical ML01 defense scenario using randomized smoothing with formal ℓ2-bounded guarantees.


Details

Domains
visionnlpmultimodal
Model Types
vlmmultimodalllm
Threat Tags
white_boxdigitalinference_time
Applications
multimodal question answeringimage-text understandingmllm inference pipelines