PickleBall: Secure Deserialization of Pickle-based Machine Learning Models (Extended Report)

Machine learning model repositories such as the Hugging Face Model Hub facilitate model exchanges. However, bad actors can deliver malware through compromised models. Existing defenses such as safer model formats, restrictive (but inflexible) loading policies, and model scanners have shortcomings: 44.9% of popular models on Hugging Face still use the insecure pickle format, 15% of these cannot be loaded by restrictive loading policies, and model scanners have both false positives and false negatives. Pickle remains the de facto standard for model exchange, and the ML community lacks a tool that offers transparent safe loading. We present PickleBall to help machine learning engineers load pickle-based models safely. PickleBall statically analyzes the source code of a given machine learning library and computes a custom policy that specifies a safe load-time behavior for benign models. PickleBall then dynamically enforces the policy during load time as a drop-in replacement for the pickle module. PickleBall generates policies that correctly load 79.8% of benign pickle-based models in our dataset, while rejecting all (100%) malicious examples in our dataset. In comparison, evaluated model scanners fail to identify known malicious models, and the state-of-art loader loads 22% fewer benign models than PickleBall. PickleBall removes the threat of arbitrary function invocation from malicious pickle-based models, raising the bar for attackers to depend on code reuse techniques.

Key Contributions

Static analysis of ML library source code to automatically compute safe pickle-loading policies tailored to specific frameworks
Dynamic policy enforcement as a drop-in replacement for Python's pickle module, blocking arbitrary code execution at load time
Empirical study showing 44.9% of popular Hugging Face models use insecure pickle format, with PickleBall correctly loading 79.8% of benign models while rejecting 100% of malicious examples

🛡️ Threat Analysis

AI Supply Chain Attacks

Directly addresses the supply chain attack vector of trojaned/malware-embedded models distributed via public model hubs (Hugging Face). The threat is bad actors uploading malicious pickle files that execute arbitrary code when loaded by unsuspecting users — a canonical ML supply chain attack. PickleBall is a defense that intercepts this supply chain threat at load time.