Yong Liu

attack arXiv Jan 9, 2026 · 12w ago

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp

Papers in Database (1)