Yule Liu

h-index: 7 358 citations 21 papers (total)

Papers in Database (2)

attack arXiv Nov 20, 2025 · Nov 2025

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang et al. · The Hong Kong University of Science and Technology · East China Normal University +5 more

Game-theoretic black-box jailbreak using Prisoner's Dilemma scenarios to flip LLM safety preferences, achieving 95%+ ASR on GPT-4o and DeepSeek-R1

Prompt Injection nlp
2 citations PDF Code
attack arXiv Nov 18, 2025 · Nov 2025

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

Yule Liu, Heyi Zhang, Jinyi Zheng et al. · The Hong Kong University of Science and Technology · Shanghai Jiao Tong University +2 more

First membership inference attack against RLVR-trained LLMs using behavioral divergence signals instead of memorization

Membership Inference Attack nlpmultimodalreinforcement-learning
1 citations PDF