Xiangman Li

Papers in Database (1)

defense arXiv Aug 21, 2025 · Aug 2025

SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks

Xiangman Li, Xiaodong Wu, Qi Li et al. · Queen’s University

Defends LLMs against jailbreak attacks via token-level FFN unlearning that irreversibly removes harmful knowledge pathways

Prompt Injection nlp
PDF