Wai Man Si

h-index: 6 183 citations 10 papers (total)

Papers in Database (1)

defense ICLR Jan 3, 2025 · Jan 2025

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Mingjie Li, Wai Man Si, Michael Backes et al. · CISPA Helmholtz Center for Information Security · Peking University

Defends LLM safety alignment from LoRA fine-tuning degradation via a fixed safety module and task-specific adapter initialization

Transfer Learning Attack Prompt Injection nlp
39 citations 8 influentialPDF