Qika Lin

Papers in Database (1)

defense arXiv Sep 8, 2025 · Sep 2025

Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint

Yanrui Du, Fenglei Fan, Sendong Zhao et al. · Harbin Institute of Technology · City University of Hong Kong +1 more

Defends LLM safety during fine-tuning by anchoring the internal refusal direction via projection-constrained loss regularization

Transfer Learning Attack Prompt Injection nlp
PDF