MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
Yilian Liu 1, Xiaojun Jia 2, Guoshun Nan 1, Jiuyang Lyu 1, Zhican Chen 1, Tao Guan 1, Shuyuan Luo 1, Zhongyi Zhai 3, Yang Liu 2
Published on arXiv
2603.00565
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves 81.46% average attack success rate across 4 closed-source MLLMs, outperforming prior state-of-the-art multimodal jailbreak methods.
MIDAS
Novel technique introduced
Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).
Key Contributions
- Multi-image jailbreak framework that decomposes harmful semantics into risk-bearing subunits dispersed across multiple images, forcing chained cross-image reasoning that delays exposure of malicious intent
- Demonstrates that longer multi-image reasoning chains significantly reduce model security attention compared to single-image masking approaches
- Achieves 81.46% average attack success rate across 4 closed-source commercial MLLMs, outperforming prior state-of-the-art jailbreak attacks