Zhouxiang Fang

Papers in Database (1)

defense arXiv Mar 10, 2026 · 27d ago

GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning

Zhouxiang Fang, Jiawei Zhou, Hanjie Chen · Rice University · Stony Brook University

Defends LLM safety alignment against fine-tuning-induced degradation using generative replay of synthesized safety data

Transfer Learning Attack Prompt Injection nlp
PDF Code