Zhuangdi Zhu

defense arXiv Jan 29, 2026 · 9w ago

Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu · George Mason University

Distillation-based LLM unlearning embeds refusal into model parameters to resist reverse-prompt attacks that recover forgotten sensitive knowledge

Sensitive Information Disclosure Prompt Injection nlp

Papers in Database (1)