Hui Xue

defense arXiv Aug 8, 2025 · Aug 2025

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal

attack arXiv Jan 9, 2025 · Jan 2025

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al. · Beihang University · Alibaba Group

Exploits shuffle inconsistency in MLLMs to jailbreak GPT-4o and Claude-3.5-Sonnet via black-box text-image prompt manipulation

Prompt Injection multimodalnlp

defense arXiv Apr 13, 2026 · 5w ago

Junxiao Yang, Haoran Liu, Jinzhe Tu et al. · Tsinghua University · Alibaba Group

Defends LLMs against cross-lingual jailbreaks by anchoring safety alignment in language-agnostic semantic representations rather than surface text

Prompt Injection nlp

Papers in Database (3)