Xiangnan He

h-index: 17 1,341 citations 47 papers (total)

Papers in Database (1)

attack arXiv Sep 24, 2025 · Sep 2025

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs

Wence Ji, Jiancan Wu, Aiying Li et al. · University of Science and Technology of China

RL-based backdoor injection into LLMs using bidirectional GRPO achieves >99% jailbreak attack success rate with preserved stealth

Model Poisoning Transfer Learning Attack nlp
PDF Code