Zijian Zhang

attack arXiv Jan 9, 2026 · 12w ago

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp

attack arXiv Sep 28, 2025 · Sep 2025

Zhaoqi Wang, Daqing He, Zijian Zhang et al. · Beijing Institute of Technology · Hefei University of Technology +1 more

Attacks LLM alignment with RL-driven formalization of jailbreak prompts combined with GraphRAG knowledge reuse

Prompt Injection nlp

Papers in Database (2)