Keegan Hines

h-index: 3 326 citations 8 papers (total)

Papers in Database (3)

attack arXiv Oct 16, 2025 · Oct 2025

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Andrew Zhao, Reshmi Ghosh, Vitor Carvalho et al. · Tsinghua University · Microsoft

Discovers LLM prompt optimizers are highly vulnerable to feedback poisoning, introducing a fake reward attack that raises harmful ASR by 0.48

Data Poisoning Attack Prompt Injection nlp
1 citations PDF
attack arXiv Feb 5, 2026 · 8w ago

GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt

Mark Russinovich, Yanan Cai, Keegan Hines et al. · Microsoft

Uses GRPO reinforcement fine-tuning with a single prompt to strip safety alignment from LLMs and diffusion models, outperforming prior unalignment attacks

Transfer Learning Attack Prompt Injection nlpgenerative
PDF
defense arXiv Feb 3, 2026 · 8w ago

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Blake Bullwinkel, Giorgio Severi, Keegan Hines et al. · Microsoft

Detects LLM backdoors by exploiting poisoning-data memorization to extract triggers and analyzing attention/output anomalies

Model Poisoning nlp
PDF