Tingwen Liu

Papers in Database (1)

defense arXiv Aug 27, 2025 · Aug 2025

Safety Alignment Should Be Made More Than Just A Few Attention Heads

Chao Huang, Zefeng Zhang, Juewei Yue et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences

Defends LLMs against jailbreaks by distributing safety alignment across more attention heads via attention-head dropout training

Prompt Injection nlp
PDF