attack 2026

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu ¹, Xiaojun Jia ², Guoshun Nan ¹, Jiuyang Lyu ¹, Zhican Chen ¹, Tao Guan ¹, Shuyuan Luo ¹, Zhongyi Zhai ³, Yang Liu ²

¹ Beijing University of Posts and Telecommunications

² Nanyang Technological University

³ Guilin University of Electronic Technology

0 citations · The Fourteenth International C...

Published on arXiv

2603.00565

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 81.46% average attack success rate across 4 closed-source MLLMs, outperforming prior state-of-the-art multimodal jailbreak methods.

MIDAS

Novel technique introduced

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).

Key Contributions

Multi-image jailbreak framework that decomposes harmful semantics into risk-bearing subunits dispersed across multiple images, forcing chained cross-image reasoning that delays exposure of malicious intent
Demonstrates that longer multi-image reasoning chains significantly reduce model security attention compared to single-image masking approaches
Achieves 81.46% average attack success rate across 4 closed-source commercial MLLMs, outperforming prior state-of-the-art jailbreak attacks

🛡️ Threat Analysis

Details

Domains

visionnlpmultimodal

Model Types

vlmllmmultimodal

Threat Tags

black_boxinference_timetargeted

Datasets

AdvBench

Applications

multimodal large language modelsvision-language modelsclosed-source commercial mllms

Read PDF arXiv Code

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks