Biwei Huang

h-index: 2 18 citations 9 papers (total)

Papers in Database (1)

defense arXiv Sep 26, 2025 · Sep 2025

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Miao Yu, Zhenhong Zhou, Moayad Aloqaily et al. · University of Science and Technology of China · Nanyang Technological University +5 more

Mechanistic interpretability framework identifies backdoor-responsible attention heads in LLMs, enabling surgical neutralization or amplification of backdoor behavior

Model Poisoning nlp
1 citations PDF