Ke Xu

h-index: 3 43 citations 4 papers (total)

Papers in Database (1)

defense arXiv Sep 29, 2025 · Sep 2025

AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models

Zihao Zhu, Xinyu Wu, Gehan Hu et al. · The Chinese University of Hong Kong · State University of New York at Buffalo +1 more

Adversarial CoT fine-tuning teaches reasoning models to self-correct harmful drifts, improving jailbreak robustness while reducing over-refusal

Prompt Injection nlp
2 citations PDF