benchmark 2026

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

Yuxuan Liu ¹, Yuntian Shi ², Kun Wang ³, Haoting Shen ¹, Kun Yang ¹

¹ Zhejiang University

² Fudan University

³ Nanyang Technological University

0 citations · 28 references · arXiv (Cornell University)

Published on arXiv

2602.03263

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

All 16 evaluated MLLMs show systematic cross-modal alignment gaps, with weak safety awareness and evidence that apparent safety gains stem from refusal heuristics rather than robust intent understanding.

CSR-Bench

Novel technique introduced

Multimodal large language models (MLLMs) enable interaction over both text and images, but their safety behavior can be driven by unimodal shortcuts instead of true joint intent understanding. We introduce CSR-Bench, a benchmark for evaluating cross-modal reliability through four stress-testing interaction patterns spanning Safety, Over-rejection, Bias, and Hallucination, covering 61 fine-grained types. Each instance is constructed to require integrated image-text interpretation, and we additionally provide paired text-only controls to diagnose modality-induced behavior shifts. We evaluate 16 state-of-the-art MLLMs and observe systematic cross-modal alignment gaps. Models show weak safety awareness, strong language dominance under interference, and consistent performance degradation from text-only controls to multimodal inputs. We also observe a clear trade-off between reducing over-rejection and maintaining safe, non-discriminatory behavior, suggesting that some apparent safety gains may come from refusal-oriented heuristics rather than robust intent understanding. WARNING: This paper contains unsafe contents.

Key Contributions

CSR-Bench: a cross-modal safety/reliability benchmark with 61 fine-grained types across Safety, Over-rejection, Bias, and Hallucination dimensions requiring integrated image-text interpretation
Paired text-only controls for diagnosing modality-induced behavior shifts in MLLMs
Systematic evaluation of 16 state-of-the-art MLLMs revealing cross-modal alignment gaps, language dominance under interference, and a safety/over-rejection trade-off

🛡️ Threat Analysis

Details

Domains

multimodalvisionnlp

Model Types

vlmllmmultimodal

Threat Tags

inference_timeblack_box

Datasets

CSR-Bench (introduced)

Applications

multimodal chat assistantsvision-language models

Read PDF arXiv DOI

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models