Xiaojun Xu

Papers in Database (1)

defense arXiv Sep 15, 2025 · Sep 2025

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

Chentao Cao, Xiaojun Xu, Bo Han et al. · ByteDance Seed · Hong Kong Baptist University

Defends LLMs against jailbreaks by training models to internally answer then self-evaluate safety before responding

Prompt Injection nlp
PDF