ML Security Papers

Latest papers

2 papers

attack arXiv Nov 11, 2025 · Nov 2025

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al. · University of Cagliari · University of Genova

Ablates multiple SOM-derived refusal directions from LLM internals to outperform standard jailbreak algorithms at suppressing safety refusal

Prompt Injection nlp

3 citations PDF Code

defense arXiv Aug 13, 2025 · Aug 2025

Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Andrea Ponte, Luca Demetrio, Luca Oneto et al. · University of Genova · RINA Consulting +1 more

Defends ML malware detectors against adversarial PE evasion by training only on YARA-undetected samples, improving robustness and reducing attack surface

Input Manipulation Attack tabular

PDF

Latest papers

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue