Ke Xu

defense arXiv Sep 29, 2025 · Sep 2025

Zihao Zhu, Xinyu Wu, Gehan Hu et al. · The Chinese University of Hong Kong · State University of New York at Buffalo +1 more

Adversarial CoT fine-tuning teaches reasoning models to self-correct harmful drifts, improving jailbreak robustness while reducing over-refusal

Prompt Injection nlp

2 citations PDF

Papers in Database (1)