Shen Lin

Papers in Database (1)

attack arXiv Mar 6, 2026 · 4w ago

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin et al. · Xidian University · Tsinghua University +2 more

Discovers two disentangled safety subspaces in LLMs and exploits them to surgically disable refusal while preserving harmfulness recognition

Prompt Injection nlp
PDF Code