Jialing Tao

Papers in Database (3)

defense arXiv Aug 8, 2025 · Aug 2025

Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
attack arXiv Jan 9, 2025 · Jan 2025

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al. · Beihang University · Alibaba Group

Exploits shuffle inconsistency in MLLMs to jailbreak GPT-4o and Claude-3.5-Sonnet via black-box text-image prompt manipulation

Prompt Injection multimodalnlp
PDF
defense arXiv Apr 13, 2026 · 5w ago

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu et al. · Tsinghua University · Alibaba Group

Defends LLMs against cross-lingual jailbreaks by anchoring safety alignment in language-agnostic semantic representations rather than surface text

Prompt Injection nlp
PDF