Huan Zhang

attack arXiv Sep 30, 2025 · Sep 2025

Wei Shen, Han Wang, Haoyu Li et al. · University of Illinois Urbana-Champaign

Backdoor attack using GRPO reward flipping to make LLMs produce plausible but incorrect chain-of-thought reasoning on trigger inputs

Model Poisoning Transfer Learning Attack nlp

1 citations PDF Code

Papers in Database (1)