Latest papers

1 papers
defense arXiv Aug 17, 2025 · Aug 2025

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al. · Microsoft Research · KAIST +5 more

Defends LLM safety during fine-tuning via hyperparameter tuning and EMA momentum, cutting harmful responses from 16% to 5%

Transfer Learning Attack Prompt Injection nlp
PDF