Haozhong Wang

h-index: 2 5 citations 4 papers (total)

Papers in Database (1)

defense arXiv Jan 12, 2026 · 12w ago

Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment

Haozhong Wang, Zhuo Li, Yibo Yang et al. · Jilin University

Defends LLM safety alignment during fine-tuning via Optimal Transport-based distributional reweighting away from harmful data

Transfer Learning Attack Prompt Injection nlp
PDF