Qingyu Yin

h-index: 2 28 citations 3 papers (total)

Papers in Database (1)

defense arXiv Oct 7, 2025 · Oct 2025

Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?

Qingyu Yin, Chak Tou Leong, Linyi Yang et al. · Zhejiang University · Xiaohongshu Inc. +6 more

Reveals mechanistic cause of safety alignment failure in reasoning LLMs and proposes data-efficient alignment repair via refusal cliff data selection

Prompt Injection nlp
2 citations PDF Code