Lefan Wang

h-index: 3 74 citations 13 papers (total)

Papers in Database (1)

attack arXiv Sep 18, 2025 · Sep 2025

Semantic Representation Attack against Aligned Large Language Models

Jiawei Lian, Jianhong Pan, Lefan Wang et al. · The Hong Kong Polytechnic University · Northwestern Polytechnical University

Jailbreaks safety-aligned LLMs by targeting semantic representation space rather than exact affirmative token patterns

Prompt Injection nlp
1 citations PDF Code