Latest papers

2 papers
attack arXiv Nov 11, 2025 · Nov 2025

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp
3 citations PDF Code
defense arXiv Aug 13, 2025 · Aug 2025

Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Andrea Ponte, Luca Demetrio, Luca Oneto et al. · University of Genova · RINA Consulting +1 more

Defends ML malware detectors against adversarial PE evasion by training only on YARA-undetected samples, improving robustness and reducing attack surface

Input Manipulation Attack tabular
PDF