defense 2025

SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

Guangzhi Su , Shuchang Huang , Yutong Ke , Zhuohang Liu , Long Qian , Kaizhu Huang

0 citations · 31 references · ICDMW

α

Published on arXiv

2510.26830

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

SmoothGuard improves MLLM resilience to adversarial attacks while maintaining competitive utility, with Gaussian noise in the 0.1–0.2 range providing the best robustness-utility trade-off.

SmoothGuard

Novel technique introduced


Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.


Key Contributions

  • SmoothGuard: a model-agnostic, training-free defense using Gaussian noise injection on continuous modalities combined with embedding-based clustering to select the majority stable prediction
  • Generalized adversarial image generation pipeline within the HuggingFace ecosystem for evaluating MLLM vulnerabilities
  • Ablation studies identifying optimal noise range (0.1–0.2) that balances adversarial robustness and utility on POPE, LLaVA-Bench, and MM-SafetyBench

🛡️ Threat Analysis

Input Manipulation Attack

SmoothGuard defends against adversarial perturbations on continuous modalities (images, audio) crafted to manipulate MLLM outputs at inference time — a direct defense against input manipulation attacks on multimodal models.


Details

Domains
visionaudiomultimodalnlp
Model Types
vlmllmmultimodal
Threat Tags
inference_timedigital
Datasets
POPELLaVA-Bench (In-the-Wild)MM-SafetyBench
Applications
multimodal question answeringvisual language reasoningmllm safety