Yu Yan

defense arXiv Jan 9, 2026 · 12w ago

Zenghao Duan, Zhiyi Yin, Zhichao Shi et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Removes a global toxic subspace from LLM FFN weights, achieving robust detoxification resistant to adversarial reactivation without retraining

Prompt Injection nlp

1 citations PDF

Papers in Database (1)