Weiyue Li

h-index: 1 191 citations 6 papers (total)

Papers in Database (1)

attack arXiv Feb 6, 2026 · 8w ago

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

Sung-Hoon Yoon, Ruizhi Qian, Minda Zhao et al. · Harvard University · Daegu Gyeongbuk Institute of Science and Technology +1 more

RL-based black-box jailbreak framework that reweights historical vulnerability signals to attack LLMs more efficiently

Prompt Injection nlp
PDF