attack 2025

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

Badhan Chandra Das , Md Tasnim Jawad , Md Jueal Mia , M. Hadi Amini , Yanzhao Wu

Florida International University

0 citations · arXiv

Published on arXiv

2511.13892

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Image typography manipulation combined with multi-turn prompting successfully jailbreaks state-of-the-art open-source and closed-source LVLMs (LLaVA, Qwen, MiniGPT-4, GPT-4o) in ITS contexts, outperforming existing jailbreak baselines as judged by GPT-4 toxicity scoring.

Typography-MultiTurn Jailbreak

Novel technique introduced

Large Vision Language Models (LVLMs) demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vulnerable to jailbreaking attacks. This paper systematically analyzes the vulnerabilities of LVLMs integrated in Intelligent Transportation Systems (ITS) under carefully crafted jailbreaking attacks. First, we carefully construct a dataset with harmful queries relevant to transportation, following OpenAI's prohibited categories to which the LVLMs should not respond. Second, we introduce a novel jailbreaking attack that exploits the vulnerabilities of LVLMs through image typography manipulation and multi-turn prompting. Third, we propose a multi-layered response filtering defense technique to prevent the model from generating inappropriate responses. We perform extensive experiments with the proposed attack and defense on the state-of-the-art LVLMs (both open-source and closed-source). To evaluate the attack method and defense technique, we use GPT-4's judgment to determine the toxicity score of the generated responses, as well as manual verification. Further, we compare our proposed jailbreaking method with existing jailbreaking techniques and highlight severe security risks involved with jailbreaking attacks with image typography manipulation and multi-turn prompting in the LVLMs integrated in ITS.

Key Contributions

Novel jailbreaking attack combining image typography manipulation (embedding harmful text in images) with multi-turn prompting to bypass LVLM safety guardrails
Transportation-domain harmful query dataset constructed following OpenAI's prohibited categories for ITS-specific jailbreak evaluation
Multi-layered response filtering defense combining rule-based techniques and zero-shot classifiers to mitigate the proposed attack

🛡️ Threat Analysis

Details

Domains

multimodalvisionnlp

Model Types

vlmllmmultimodal

Threat Tags

black_boxinference_time

Datasets

custom ITS harmful query dataset

Applications

intelligent transportation systemsautonomous drivingvisual question answering

Read PDF arXiv DOI

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack