Guangliang Liu

Papers in Database (1)

defense arXiv Mar 12, 2026 · 25d ago

Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment

Zhiyu Xue, Zimo Qi, Guangliang Liu et al. · University of California · Johns Hopkins University +2 more

Analyzes refusal trigger mechanisms in LLM safety alignment to reduce overrefusal while maintaining jailbreak defenses

Prompt Injection nlp
PDF