benchmark 2026

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

0 citations · 45 references · arXiv

Published on arXiv

2601.22737

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Across 11 open-source VLLMs, image-dominant harmful inputs yield higher attack success rates in high-resource languages while text-dominant harmful inputs are more severe in non-high-resource languages, and model scaling narrows overall ASR but widens this cross-lingual gap.

Lingua-SafetyBench

Novel technique introduced

Robust safety of vision-language large models (VLLMs) under joint multilingual and multimodal inputs remains underexplored. Existing benchmarks are typically multilingual but text-only, or multimodal but monolingual. Recent multilingual multimodal red-teaming efforts render harmful prompts into images, yet rely heavily on typography-style visuals and lack semantically grounded image-text pairs, limiting coverage of realistic cross-modal interactions. We introduce Lingua-SafetyBench, a benchmark of 100,440 harmful image-text pairs across 10 languages, explicitly partitioned into image-dominant and text-dominant subsets to disentangle risk sources. Evaluating 11 open-source VLLMs reveals a consistent asymmetry: image-dominant risks yield higher ASR in high-resource languages, while text-dominant risks are more severe in non-high-resource languages. A controlled study on the Qwen series shows that scaling and version upgrades reduce Attack Success Rate (ASR) overall but disproportionately benefit HRLs, widening the gap between HRLs and Non-HRLs under text-dominant risks. This underscores the necessity of language- and modality-aware safety alignment beyond mere scaling.To facilitate reproducibility and future research, we will publicly release our benchmark, model checkpoints, and source code.The code and dataset will be available at https://github.com/zsxr15/Lingua-SafetyBench.Warning: this paper contains examples with unsafe content.

Key Contributions

Lingua-SafetyBench: 100,440 semantically aligned harmful image-text pairs across 10 languages, explicitly partitioned into image-dominant and text-dominant subsets to disentangle risk sources
Empirical discovery of a consistent modality-language asymmetry: image-dominant risks yield higher ASR in high-resource languages, while text-dominant risks are more severe in non-high-resource languages across 11 open-source VLLMs
Identification of a 'safety Matthew Effect' where model scaling and version upgrades disproportionately benefit high-resource languages, widening cross-lingual safety gaps under text-dominant risks

🛡️ Threat Analysis

Details

Domains

multimodalnlp

Model Types

vlmllmmultimodal

Threat Tags

inference_timeblack_box

Datasets

Lingua-SafetyBench (100,440 harmful image-text pairs, 10 languages)Qwen series (controlled study)

Applications

vision-language modelsmultimodal ai safetymultilingual ai systems

Read PDF arXiv DOI Code

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

SafeMT: Multi-turn Safety for Multimodal Language Models

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels