Pengtao Kou

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

attack arXiv Jan 9, 2026 · 12w ago

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp
PDF