Jimmy Lin

h-index: 3 21 citations 7 papers (total)

Papers in Database (1)

defense arXiv Jan 31, 2026 · 9w ago

Unifying Adversarial Robustness and Training Across Text Scoring Models

Manveer Singh Tamber, Hosna Oyarhoseini, Jimmy Lin · University of Waterloo

Unified adversarial training framework for text scoring LMs defending against token-manipulation and content injection attacks including reward hacking

Input Manipulation Attack Prompt Injection nlp
PDF Code