Weiyang Guo

Papers in Database (1)

attack arXiv Apr 10, 2026 · 5w ago

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Weiyang Guo, Zesheng Shi, Zeen Zhu et al. · Harbin Institute of Technology · Huawei Technologies

Backdoor attack on RLVR-trained LLMs that implants jailbreak triggers using 2% poisoned data, degrading safety by 73%

Model Poisoning Transfer Learning Attack Prompt Injection nlpreinforcement-learning
PDF Code