Xiaohu Yang

h-index: 1 1 citations 2 papers (total)

Papers in Database (2)

defense arXiv Jan 5, 2026 · Jan 2026

Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance

Jiawen Zhang, Lipeng He, Kejia Chen et al. · Zhejiang University · University of Waterloo +2 more

Recovers LLM safety alignment after harmful fine-tuning using a single safety example via low-rank gradient structure

Transfer Learning Attack Prompt Injection nlp
1 citations PDF
defense arXiv Jan 15, 2026 · 11w ago

Understanding and Preserving Safety in Fine-Tuned LLMs

Jiawen Zhang, Yangfan Hu, Kejia Chen et al. · Zhejiang University · University of Wisconsin–Madison +4 more

Preserves LLM jailbreak resistance through fine-tuning by projecting utility gradients away from the low-rank safety subspace

Transfer Learning Attack Prompt Injection nlp
PDF Code