Yinpeng Dong

h-index: 42 11,721 citations 118 papers (total)

Papers in Database (1)

defense arXiv Sep 29, 2025 · Sep 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Yichi Zhang, Yue Ding, Jingwen Yang et al. · arXiv · Shanghai Qi Zhi Institute +3 more

Defends Large Reasoning Models against jailbreaks by aligning CoT safety via process-supervised preference optimization with corrective interventions

Prompt Injection nlp
2 citations 1 influentialPDF