Luca Oneto

h-index: 6 118 citations 40 papers (total)

Papers in Database (1)

attack arXiv Nov 11, 2025 · Nov 2025

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp
3 citations PDF Code