Raffaele Mura

h-index: 3 16 citations 6 papers (total)

Papers in Database (2)

attack arXiv Nov 11, 2025 · Nov 2025

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp

3 citations PDF Code

attack arXiv Oct 7, 2025 · Oct 2025

Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė et al. · University of Cagliari · Centre for AI Governance +1 more

White-box LLM jailbreak using latent-space-guided word substitutions to produce low-perplexity prompts that evade perplexity-based safety filters

Prompt Injection nlp

1 citations PDF