Yukun Jiang

h-index: 5 125 citations 13 papers (total)

Papers in Database (3)

attack arXiv Oct 24, 2025 · Oct 2025

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang, Mingjie Li, Michael Backes et al. · CISPA Helmholtz Center for Information Security

Jailbreaks LLMs by interleaving harmful and benign task words, hiding malicious intent from safety guardrails with 95% attack success rate

Prompt Injection nlp
9 citations 1 influentialPDF Code
tool arXiv Oct 31, 2025 · Oct 2025

From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection

Mengfei Liang, Yiting Qu, Yukun Jiang et al. · CISPA Helmholtz Center for Information Security

Multi-agent forensic framework with LLM debate and memory module achieves 97% accuracy on AI-generated image detection

Output Integrity Attack visionnlp
1 citations PDF
attack arXiv Feb 9, 2026 · 8w ago

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Yukun Jiang, Hai Huang, Mingjie Li et al. · CISPA Helmholtz Center for Information Security

Discovers unsafe routing configurations in MoE LLMs that bypass safety alignment, achieving 0.98 ASR on AdvBench via router optimization

Prompt Injection nlp
PDF Code