Hui Xue

Papers in Database (2)

defense arXiv Aug 8, 2025 · Aug 2025

Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
attack arXiv Jan 9, 2025 · Jan 2025

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al. · Beihang University · Alibaba Group

Exploits shuffle inconsistency in MLLMs to jailbreak GPT-4o and Claude-3.5-Sonnet via black-box text-image prompt manipulation

Prompt Injection multimodalnlp
PDF