Mengyu Wang

h-index: 1 2 citations 3 papers (total)

Papers in Database (1)

attack arXiv Feb 6, 2026 · 8w ago

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

Sung-Hoon Yoon, Ruizhi Qian, Minda Zhao et al. · Harvard University · Daegu Gyeongbuk Institute of Science and Technology +1 more

RL-based black-box jailbreak framework that reweights historical vulnerability signals to attack LLMs more efficiently

Prompt Injection nlp
PDF