Jun Luo

h-index: 3 17 citations 6 papers (total)

Papers in Database (2)

attack arXiv Oct 3, 2025 · Oct 2025

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo · Nanyang Technological University

Jailbreaks LLMs via two-stage overfitting fine-tuning with 10 benign QA pairs, bypassing moderation entirely

Transfer Learning Attack Prompt Injection nlp
5 citations PDF Code
attack arXiv Jan 18, 2026 · 11w ago

TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning

Zhixin Xie, Xurui Song, Jun Luo · Nanyang Technological University

Jailbreaks LLMs via benign fine-tuning data containing a crafted trigger word that shifts safety attitude while evading moderation

Transfer Learning Attack Prompt Injection nlp
2 citations PDF