Giorgio Piras

h-index: 4 35 citations 16 papers (total)

Papers in Database (3)

attack arXiv Nov 11, 2025 · Nov 2025

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp
3 citations PDF Code
attack arXiv Oct 7, 2025 · Oct 2025

LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė et al. · University of Cagliari · Centre for AI Governance +1 more

White-box LLM jailbreak using latent-space-guided word substitutions to produce low-perplexity prompts that evade perplexity-based safety filters

Prompt Injection nlp
1 citations PDF
defense arXiv Oct 21, 2025 · Oct 2025

S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Giorgio Piras, Qi Zhao, Fabio Brau et al. · University of Cagliari · Karlsruhe Institute of Technology

Plug-in sharpness minimization for adversarial pruning that stabilizes mask selection and improves pruned model robustness against adversarial attacks

Input Manipulation Attack vision
PDF