Weixiang Zhao

h-index: 4 46 citations 7 papers (total)

Papers in Database (2)

defense arXiv Feb 12, 2026 · 7w ago

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Zhaoxin Wang, Jiaming Liang, Fengbin Zhu et al. · Xidian University · National University of Singapore +1 more

Defends LLM safety alignment against neuron pruning attacks by redistributing safety representations across the network via selective neuron freezing

Prompt Injection nlpmultimodal
PDF
defense arXiv Feb 1, 2026 · 9w ago

Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons

Xianhui Zhang, Chengyu Xie, Linxia Zhu et al. · Nanjing University of Science and Technology · National University of Singapore +2 more

Identifies sparse cross-lingual safety neurons in LLMs and proposes targeted fine-tuning to close multilingual jailbreak safety gaps

Prompt Injection nlp
PDF Code