attack 2026

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu 1, Xiaojun Jia 2, Guoshun Nan 1, Jiuyang Lyu 1, Zhican Chen 1, Tao Guan 1, Shuyuan Luo 1, Zhongyi Zhai 3, Yang Liu 2

0 citations · The Fourteenth International C...

α

Published on arXiv

2603.00565

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 81.46% average attack success rate across 4 closed-source MLLMs, outperforming prior state-of-the-art multimodal jailbreak methods.

MIDAS

Novel technique introduced


Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).


Key Contributions

  • Multi-image jailbreak framework that decomposes harmful semantics into risk-bearing subunits dispersed across multiple images, forcing chained cross-image reasoning that delays exposure of malicious intent
  • Demonstrates that longer multi-image reasoning chains significantly reduce model security attention compared to single-image masking approaches
  • Achieves 81.46% average attack success rate across 4 closed-source commercial MLLMs, outperforming prior state-of-the-art jailbreak attacks

🛡️ Threat Analysis


Details

Domains
visionnlpmultimodal
Model Types
vlmllmmultimodal
Threat Tags
black_boxinference_timetargeted
Datasets
AdvBench
Applications
multimodal large language modelsvision-language modelsclosed-source commercial mllms