Bernard Ghanem

attack arXiv Mar 19, 2026 · 18d ago

Carlos Hinojosa, Clemens Grange, Bernard Ghanem · King Abdullah University of Science and Technology · Technical University of Munich

Demonstrates VLM safety decisions rely on semantic cues rather than visual understanding, enabling automated steering to bypass safety controls

Input Manipulation Attack Prompt Injection multimodalvisionnlp

defense arXiv Aug 28, 2025 · Aug 2025

Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, George Turkiyyah et al. · King Abdullah University of Science and Technology

Amplifies LLM jailbreak refusal via rank-one weight steering of refusal directions, no fine-tuning required

Prompt Injection nlp

Papers in Database (2)