Guangtai Wang

Papers in Database (1)

attack arXiv Apr 9, 2026 · 6w ago

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

Wenpeng Xing, Moran Fang, Guangtai Wang et al. · Zhejiang University · Binjiang Institute of Zhejiang University +1 more

Inference-time jailbreak attack that surgically ablates safety guardrails by suppressing refusal-inducing activation patterns in LLM hidden states

Prompt Injection nlp
PDF