Lixu Wang

Papers in Database (1)

defense arXiv Aug 28, 2025 · Aug 2025

Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning

Weitao Feng, Lixu Wang, Tianyi Wei et al. · Nanyang Technological University · A*STAR +1 more

Defends LLM safety alignment against RL fine-tuning attacks by suppressing response entropy via TokenBuncher

Transfer Learning Attack Prompt Injection nlpreinforcement-learning
PDF