Carlos Hinojosa

Papers in Database (1)

attack arXiv Mar 19, 2026 · 18d ago

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

Carlos Hinojosa, Clemens Grange, Bernard Ghanem · King Abdullah University of Science and Technology · Technical University of Munich

Demonstrates VLM safety decisions rely on semantic cues rather than visual understanding, enabling automated steering to bypass safety controls

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF