Jiaqi Weng

Papers in Database (1)

defense arXiv Aug 8, 2025 · Aug 2025

Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code