Xingcheng Xu

h-index: 2 8 citations 6 papers (total)

Papers in Database (2)

defense arXiv Feb 4, 2026 · 8w ago

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Zeming Wei, Qiaosheng Zhang, Xia Hu et al. · Shanghai AI Laboratory · Peking University

Risk-aware preference optimization framework that generalizes LRM safe reasoning against diverse jailbreak attacks without sacrificing utility

Prompt Injection nlp
PDF Code
defense arXiv Feb 2, 2026 · 9w ago

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Xiaoyu Wen, Zhida He, Han Qi et al. · Shanghai AI Laboratory · Shanghai Jiao Tong University +1 more

Multi-agent RL co-evolves an LLM attacker and defender, generating novel jailbreaks to train robust safety alignment against unseen prompts

Prompt Injection nlpreinforcement-learning
PDF Code