Xiting Wang

Papers in Database (2)

defense arXiv Oct 17, 2025 · Oct 2025

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jialing Tao et al.

Defends VLMs against unknown jailbreak attacks via task-specific safety representation learning and unsupervised attack classification

Prompt Injection visionmultimodalnlp
PDF Code
defense arXiv Aug 8, 2025 · Aug 2025

Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code