Mingjie Li

defense ICLR Jan 3, 2025 · Jan 2025

Mingjie Li, Wai Man Si, Michael Backes et al. · CISPA Helmholtz Center for Information Security · Peking University

Defends LLM safety alignment from LoRA fine-tuning degradation via a fixed safety module and task-specific adapter initialization

Transfer Learning Attack Prompt Injection nlp

39 citations 8 influentialPDF

attack arXiv Oct 24, 2025 · Oct 2025

Yukun Jiang, Mingjie Li, Michael Backes et al. · CISPA Helmholtz Center for Information Security

Jailbreaks LLMs by interleaving harmful and benign task words, hiding malicious intent from safety guardrails with 95% attack success rate

Prompt Injection nlp

9 citations 1 influentialPDF Code

attack arXiv Feb 9, 2026 · 8w ago

Yukun Jiang, Hai Huang, Mingjie Li et al. · CISPA Helmholtz Center for Information Security

Discovers unsafe routing configurations in MoE LLMs that bypass safety alignment, achieving 0.98 ASR on AdvBench via router optimization

Prompt Injection nlp

Papers in Database (3)