Latest papers

1 papers
attack arXiv Jan 27, 2026 · 10w ago

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Quy-Anh Dang, Chris Ngo · VNU University of Science · Knovel Engineering Lab

Norm-preserving activation steering attack bypasses LLM safety alignment with 5.5x higher jailbreak success than prior methods

Prompt Injection nlp
PDF Code