Wenhan Yu

h-index: 1 3 citations 4 papers (total)

Papers in Database (1)

defense arXiv Jan 26, 2026 · 10w ago

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

Zhewen Tan, Wenhan Yu, Jianfeng Si et al. · Peking University · Qiyuan Tech +1 more

Closed-loop RL framework co-training LLM attacker, defender, and evaluator to iteratively improve safety alignment with minimal annotation

Prompt Injection nlpreinforcement-learning
PDF Code