Zhun Deng

h-index: 3 22 citations 4 papers (total)

Papers in Database (1)

attack arXiv Feb 14, 2026 · 7w ago

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun et al. · University of North Carolina at Chapel Hill · Carnegie Mellon University +2 more

Attacks LLM alignment pipelines by crafting benchmark-compliant rubric edits that systematically bias judge preferences and corrupt RLHF training

Transfer Learning Attack Prompt Injection nlp
PDF Code