Amanda Minnich

h-index: 4 74 citations 9 papers (total)

Papers in Database (1)

defense arXiv Feb 3, 2026 · 8w ago

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Blake Bullwinkel, Giorgio Severi, Keegan Hines et al. · Microsoft

Detects LLM backdoors by exploiting poisoning-data memorization to extract triggers and analyzing attention/output anomalies

Model Poisoning nlp
PDF