Haihui Fan

h-index: 4 57 citations 34 papers (total)

Papers in Database (1)

defense arXiv Nov 16, 2025 · Nov 2025

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

Haotian Jin, Yang Li, Haihui Fan et al. · Chinese Academy of Sciences · State Key Laboratory of Cyberspace Security Defense +1 more

Defends LLMs against backdoor attacks by detecting abnormal inter-head attention similarity and realigning contaminated attention heads via fine-tuning

Model Poisoning nlp
1 citations PDF