Kesen Zhao

Papers in Database (1)

defense arXiv Mar 23, 2026 · 16d ago

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Xingyu Zhu, Beier Zhu, Shuo Wang et al. · University of Science and Technology of China · National University of Singapore +1 more

Null-space projection defense that blocks VLM jailbreaks while preserving benign performance through theoretically-grounded activation steering

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF