Bernard Ghanem

Papers in Database (2)

attack arXiv Mar 19, 2026 · 18d ago

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

Carlos Hinojosa, Clemens Grange, Bernard Ghanem · King Abdullah University of Science and Technology · Technical University of Munich

Demonstrates VLM safety decisions rely on semantic cues rather than visual understanding, enabling automated steering to bypass safety controls

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
defense arXiv Aug 28, 2025 · Aug 2025

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection

Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, George Turkiyyah et al. · King Abdullah University of Science and Technology

Amplifies LLM jailbreak refusal via rank-one weight steering of refusal directions, no fine-tuning required

Prompt Injection nlp
PDF