Yu Yan

h-index: 2 8 citations 3 papers (total)

Papers in Database (1)

defense arXiv Jan 9, 2026 · 12w ago

Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification

Zenghao Duan, Zhiyi Yin, Zhichao Shi et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Removes a global toxic subspace from LLM FFN weights, achieving robust detoxification resistant to adversarial reactivation without retraining

Prompt Injection nlp
1 citations PDF