Yuhao Wu

h-index: 6 124 citations 21 papers (total)

Papers in Database (1)

attack arXiv Dec 19, 2025 · Dec 2025

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens

Tung-Ling Li, Yuhao Wu, Hongliang Liu · Palo Alto Networks

Beam-search adversarial control tokens flip LLM-as-a-Judge binary decisions in RLHF pipelines, enabling reward hacking with low-perplexity sequences

Input Manipulation Attack Prompt Injection nlp
PDF