Sihao Wu

Papers in Database (1)

defense arXiv Aug 30, 2025 · Aug 2025

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Sihao Wu, Gaojie Jin, Wei Huang et al. · University of Liverpool · University of Exeter +2 more

Defends VLMs against visual adversarial jailbreaks via adaptive activation steering vectors refined through sequence-level preference optimization

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF