Yuelin Xu

h-index: 1 28 citations 7 papers (total)

Papers in Database (1)

attack arXiv Oct 10, 2025 · Oct 2025

GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis

Subrat Kishore Dutta, Yuelin Xu, Piyush Pant et al. · CISPA Helmholtz Center for Information Security

Backdoor attack on RLHF preference data using emotion-aware triggers that generalizes to unseen angry-user inputs

Model Poisoning Transfer Learning Attack nlpreinforcement-learning
PDF