SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models
Hillel Ohayon , Daniel Gilkarov , Ran Dubin
Published on arXiv
2602.19818
AI Supply Chain Attacks
OWASP ML Top 10 — ML06
Key Finding
Achieves 90.01% F1-score and correctly classifies all 9 evasive Hide-and-Seek malicious models, far exceeding Modelscan, Fickling, ClamAV, and VirusTotal on the same benchmarks
SafePickle
Novel technique introduced
Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.
Key Contributions
- SafePickle: a lightweight ML-based static scanner extracting structural and semantic features from pickle bytecode to classify model files as benign or malicious without policy generation or code instrumentation
- Labeled dataset of 727 pickle-based files collected from HuggingFace, released publicly for future research
- Comprehensive evaluation across four datasets including 9 evasion-optimized Hide-and-Seek malicious models, demonstrating 90.01% F1 vs 7.23%–62.75% for SOTA scanners and correct classification of all evasive samples
🛡️ Threat Analysis
Directly defends against supply chain attacks where malicious actors upload pickle-serialized model files to HuggingFace that execute arbitrary code (RCE) at load time — the attack vector is model hub distribution infrastructure, and the defense is a pre-load scanner.