attack 2025

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework

Yuxuan Zhou 1, Yang Bai 2, Kuofeng Gao 1, Tao Dai 3, Shu-Tao Xia 1

0 citations · 33 references · arXiv

α

Published on arXiv

2511.07315

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

JPRO achieves over 60% attack success rate on GPT-4o and multiple advanced VLMs in a fully black-box setting, outperforming existing jailbreak methods

JPRO

Novel technique introduced


The widespread application of large VLMs makes ensuring their secure deployment critical. While recent studies have demonstrated jailbreak attacks on VLMs, existing approaches are limited: they require either white-box access, restricting practicality, or rely on manually crafted patterns, leading to poor sample diversity and scalability. To address these gaps, we propose JPRO, a novel multi-agent collaborative framework designed for automated VLM jailbreaking. It effectively overcomes the shortcomings of prior methods in attack diversity and scalability. Through the coordinated action of four specialized agents and its two core modules: Tactic-Driven Seed Generation and Adaptive Optimization Loop, JPRO generates effective and diverse attack samples. Experimental results show that JPRO achieves over a 60\% attack success rate on multiple advanced VLMs, including GPT-4o, significantly outperforming existing methods. As a black-box attack approach, JPRO not only uncovers critical security vulnerabilities in multimodal models but also offers valuable insights for evaluating and enhancing VLM robustness.


Key Contributions

  • First black-box multi-agent framework (Planner, Attacker, Modifier, Verifier) for automated, scalable VLM jailbreaking without requiring model internals
  • Tactic-Driven Seed Generation and Adaptive Optimization Loop modules that produce diverse and effective multimodal adversarial samples
  • Achieves >60% attack success rate on GPT-4o and other state-of-the-art VLMs, significantly outperforming prior methods

🛡️ Threat Analysis


Details

Domains
multimodalnlp
Model Types
vlmllm
Threat Tags
black_boxinference_timetargeted
Applications
vision-language modelsmultimodal chatbots