Hao Li

h-index: 2 20 citations 3 papers (total)

Papers in Database (1)

defense arXiv Jan 15, 2026 · 11w ago

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li et al. · Beihang University · Peking University +1 more

Defends LLMs against jailbreaks via self-play RL where one model concurrently generates and resists adversarial prompts

Prompt Injection nlp
PDF