Xinlei He

h-index: 6 104 citations 19 papers (total)

Papers in Database (3)

attack arXiv Nov 20, 2025 · Nov 2025

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang et al. · The Hong Kong University of Science and Technology · East China Normal University +5 more

Game-theoretic black-box jailbreak using Prisoner's Dilemma scenarios to flip LLM safety preferences, achieving 95%+ ASR on GPT-4o and DeepSeek-R1

Prompt Injection nlp
2 citations PDF Code
attack arXiv Dec 22, 2025 · Dec 2025

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun et al. · The University of Hong Kong · The Hong Kong University of Science and Technology +2 more

Backdoor attack on 6DoF pose estimation using 3D object triggers to induce controlled erroneous rotations and translations with 100% ASR

Model Poisoning vision
1 citations PDF Code
attack arXiv Nov 18, 2025 · Nov 2025

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

Yule Liu, Heyi Zhang, Jinyi Zheng et al. · The Hong Kong University of Science and Technology · Shanghai Jiao Tong University +2 more

First membership inference attack against RLVR-trained LLMs using behavioral divergence signals instead of memorization

Membership Inference Attack nlpmultimodalreinforcement-learning
1 citations PDF