Jiawei Lian

attack arXiv Sep 18, 2025 · Sep 2025

Jiawei Lian, Jianhong Pan, Lefan Wang et al. · The Hong Kong Polytechnic University · Northwestern Polytechnical University

Jailbreaks safety-aligned LLMs by targeting semantic representation space rather than exact affirmative token patterns

Prompt Injection nlp

1 citations PDF Code

attack arXiv Jan 20, 2026 · 10w ago

Tairan Huang, Qingqing Ye, Yulin Jin et al. · The Hong Kong Polytechnic University

Diffusion-generated floor patch triggers bypass real-world safety control stacks to reliably activate backdoors in RL robot policies

Model Poisoning reinforcement-learningvision

Papers in Database (2)