Harethah Abu Shairah

Papers in Database (1)

defense arXiv Aug 28, 2025 · Aug 2025

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection

Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, George Turkiyyah et al. · King Abdullah University of Science and Technology

Amplifies LLM jailbreak refusal via rank-one weight steering of refusal directions, no fine-tuning required

Prompt Injection nlp
PDF