Zihan Guan

h-index: 6 119 citations 13 papers (total)

Papers in Database (1)

defense arXiv Feb 24, 2026 · 5w ago

Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

Mengxuan Hu, Vivek V. Datla, Anoop Kumar et al. · University of Virginia · Capital One

Defends LLMs against jailbreaks by training reasoning-aware refusals via CoT datasets and segment-weighted DPO

Prompt Injection nlp
PDF