Hai Huang

Papers in Database (1)

attack arXiv Feb 9, 2026 · 8w ago

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Yukun Jiang, Hai Huang, Mingjie Li et al. · CISPA Helmholtz Center for Information Security

Discovers unsafe routing configurations in MoE LLMs that bypass safety alignment, achieving 0.98 ASR on AdvBench via router optimization

Prompt Injection nlp
PDF Code