Dongwon Lee

h-index: 4 104 citations 24 papers (total)

Papers in Database (1)

defense arXiv Jan 31, 2026 · 9w ago

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models

Jiaxi Yang, Shicheng Liu, Yuchen Yang et al. · The Pennsylvania State University

Proposes activation steering-based configurable refusal for VLMs that adaptively balances under- and over-refusal

Prompt Injection visionnlpmultimodal
PDF