Latest papers

1 papers
defense arXiv Oct 11, 2025 · Oct 2025

Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

Liang Lin, Miao Yu, Moayad Aloqaily et al. · Nanyang Technological University · University of Science and Technology of China +4 more

Defends LLMs against unknown backdoors by intentionally injecting known triggers to aggregate and then purge backdoor representations

Model Poisoning nlp
PDF