Shang Liu

h-index: 0 0 citations 2 papers (total)

Papers in Database (1)

attack arXiv Feb 6, 2026 · 8w ago

ShallowJail: Steering Jailbreaks against Large Language Models

Shang Liu, Hanyu Pei, Zeyan Liu · University of Louisville

Exploits shallow LLM alignment by injecting activation steering vectors into hidden states, achieving >90% jailbreak success without gradient optimization

Prompt Injection nlp
PDF Code