Zihan Guan

defense arXiv Feb 24, 2026 · 5w ago

Mengxuan Hu, Vivek V. Datla, Anoop Kumar et al. · University of Virginia · Capital One

Defends LLMs against jailbreaks by training reasoning-aware refusals via CoT datasets and segment-weighted DPO

Prompt Injection nlp

Papers in Database (1)