Sonia Vanier

attack arXiv Mar 15, 2026 · 24d ago

Maël Jenny, Jérémie Dentan, Sonia Vanier et al.

Jailbreaks white-box LLMs by surgically replacing internal activations layer-by-layer to bypass safety refusals without modifying prompts

Prompt Injection nlp

Papers in Database (1)