Shujian Huang

Papers in Database (2)

defense arXiv Aug 21, 2025 · Aug 2025

Peng Ding, Wen Sun, Dailin Li et al. · Meituan Inc. · Dalian University of Technology +1 more

RL defense uses LLMs' own harm-discrimination ability as a reward signal to close the gap between identifying and resisting jailbreaks

Prompt Injection nlp

attack arXiv Nov 1, 2025 · Nov 2025

Peng Ding, Jun Kuang, Wen Sun et al. · Nanjing University · Meituan

Jailbreaks LLMs via minimal intent-shifting text edits, bypassing safety filters with natural human-readable prompts

Prompt Injection nlp