Mengyao Du

h-index: 1 4 citations 5 papers (total)

Papers in Database (1)

defense arXiv Feb 6, 2026 · 8w ago

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · National University of Singapore +1 more

Proactive fine-tuning defense traps gradient-based jailbreak suffixes or fingerprints them, cutting LLM attack success below 0.01%

Input Manipulation Attack Prompt Injection nlp
PDF