Raffaele Mura

h-index: 3 16 citations 6 papers (total)

Papers in Database (2)

attack arXiv Nov 11, 2025 · Nov 2025

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp
3 citations PDF Code
attack arXiv Oct 7, 2025 · Oct 2025

LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė et al. · University of Cagliari · Centre for AI Governance +1 more

White-box LLM jailbreak using latent-space-guided word substitutions to produce low-perplexity prompts that evade perplexity-based safety filters

Prompt Injection nlp
1 citations PDF