SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.

Key Contributions

SafePickle: a lightweight ML-based static scanner extracting structural and semantic features from pickle bytecode to classify model files as benign or malicious without policy generation or code instrumentation
Labeled dataset of 727 pickle-based files collected from HuggingFace, released publicly for future research
Comprehensive evaluation across four datasets including 9 evasion-optimized Hide-and-Seek malicious models, demonstrating 90.01% F1 vs 7.23%–62.75% for SOTA scanners and correct classification of all evasive samples

🛡️ Threat Analysis

AI Supply Chain Attacks

Directly defends against supply chain attacks where malicious actors upload pickle-serialized model files to HuggingFace that execute arbitrary code (RCE) at load time — the attack vector is model hub distribution infrastructure, and the defense is a pre-load scanner.