Maël Jenny

Papers in Database (1)

attack arXiv Mar 15, 2026 · 24d ago

Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt

Maël Jenny, Jérémie Dentan, Sonia Vanier et al.

Jailbreaks white-box LLMs by surgically replacing internal activations layer-by-layer to bypass safety refusals without modifying prompts

Prompt Injection nlp
PDF