Gary Geunbae Lee

h-index: 3 33 citations 11 papers (total)

Papers in Database (1)

defense arXiv Jan 7, 2026 · 12w ago

Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

San Kim, Gary Geunbae Lee · POSTECH

Defends instruction-tuned LLMs against backdoor attacks by merging attacker and defensive triggers then breaking the combined representation via weight recovery

Model Poisoning nlp
PDF