Huan Zhang

h-index: 2 26 citations 5 papers (total)

Papers in Database (1)

attack arXiv Sep 30, 2025 · Sep 2025

DecepChain: Inducing Deceptive Reasoning in Large Language Models

Wei Shen, Han Wang, Haoyu Li et al. · University of Illinois Urbana-Champaign

Backdoor attack using GRPO reward flipping to make LLMs produce plausible but incorrect chain-of-thought reasoning on trigger inputs

Model Poisoning Transfer Learning Attack nlp
1 citations PDF Code