Dan Li

h-index: 6 218 citations 27 papers (total)

Papers in Database (1)

defense arXiv Jan 15, 2026 · 11w ago

Understanding and Preserving Safety in Fine-Tuned LLMs

Jiawen Zhang, Yangfan Hu, Kejia Chen et al. · Zhejiang University · University of Wisconsin–Madison +4 more

Preserves LLM jailbreak resistance through fine-tuning by projecting utility gradients away from the low-rank safety subspace

Transfer Learning Attack Prompt Injection nlp
PDF Code