Lichao Wu

attack arXiv Feb 9, 2026 · 8w ago

Jona te Lintelo, Lichao Wu, Stjepan Picek · Radboud University · Technical University of Darmstadt +1 more

Jailbreaks MoE LLMs by silencing safety-critical experts at inference time, boosting attack success from 7.3% to 70.4%

Prompt Injection nlp

Papers in Database (1)