Sendong Zhao

Papers in Database (2)

defense arXiv Sep 8, 2025 · Sep 2025

Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint

Yanrui Du, Fenglei Fan, Sendong Zhao et al. · Harbin Institute of Technology · City University of Hong Kong +1 more

Defends LLM safety during fine-tuning by anchoring the internal refusal direction via projection-constrained loss regularization

Transfer Learning Attack Prompt Injection nlp
PDF
defense arXiv Sep 8, 2025 · Sep 2025

MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security

Yanrui Du, Fenglei Fan, Sendong Zhao et al. · Harbin Institute of Technology · City University of Hong Kong

Defends LLMs against harmful prompts by dynamically routing between security- and usability-optimized model variants via hidden-state-aware routers

Transfer Learning Attack Prompt Injection nlp
PDF