benchmark 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia 1, Jie Liao 2,3, Qi Guo 2,4, Teng Ma 2,5, Simeng Qin 2,6, Ranjie Duan 7, Tianlin Li 1, Yihao Huang 1, Zhitao Zeng 8, Dongxian Wu 9, Yiming Li 1, Wenqi Ren 5, Xiaochun Cao 5, Yang Liu 1

5 citations · 54 references · arXiv

α

Published on arXiv

2512.06589

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Comprehensive evaluation of 18 MLLMs reveals systematic vulnerabilities to multimodal jailbreak attacks across all tested models, with both open-source and closed-source systems susceptible under the unified three-dimensional evaluation framework.

OmniSafeBench-MM

Novel technique introduced


Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors. Existing benchmarks such as JailBreakV-28K, MM-SafetyBench, and HADES provide valuable insights into multi-modal vulnerabilities, but they typically focus on limited attack scenarios, lack standardized defense evaluation, and offer no unified, reproducible toolbox. To address these gaps, we introduce OmniSafeBench-MM, which is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation. OmniSafeBench-MM integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories, structured across consultative, imperative, and declarative inquiry types to reflect realistic user intentions. Beyond data coverage, it establishes a three-dimensional evaluation protocol measuring (1) harmfulness, distinguished by a granular, multi-level scale ranging from low-impact individual harm to catastrophic societal threats, (2) intent alignment between responses and queries, and (3) response detail level, enabling nuanced safety-utility analysis. We conduct extensive experiments on 10 open-source and 8 closed-source MLLMs to reveal their vulnerability to multi-modal jailbreak. By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research. The code is released at https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.


Key Contributions

  • OmniSafeBench-MM toolbox integrating 13 jailbreak attack methods and 15 defense strategies in a single reproducible platform
  • Diverse dataset spanning 9 major risk domains and 50 fine-grained categories across consultative, imperative, and declarative query types
  • Three-dimensional evaluation protocol measuring harmfulness severity, intent alignment, and response detail level, tested on 10 open-source and 8 closed-source MLLMs

🛡️ Threat Analysis


Details

Domains
multimodalnlpvision
Model Types
vlmllmmultimodal
Threat Tags
black_boxgrey_boxwhite_boxinference_time
Datasets
JailBreakV-28KMM-SafetyBenchHADESOmniSafeBench-MM (proposed)
Applications
multimodal large language modelsvision-language modelssafety alignment evaluation