NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry

Pretrained deep learning model sharing holds tremendous value for researchers and enterprises alike. It allows them to apply deep learning by fine-tuning models at a fraction of the cost of training a brand-new model. However, model sharing exposes end-users to cyber threats that leverage the models for malicious purposes. Attackers can use model sharing by hiding self-executing malware inside neural network parameters and then distributing them for unsuspecting users to unknowingly directly execute them, or indirectly as a dependency in another software. In this work, we propose NeuPerm, a simple yet effec- tive way of disrupting such malware by leveraging the theoretical property of neural network permutation symmetry. Our method has little to no effect on model performance at all, and we empirically show it successfully disrupts state-of-the-art attacks that were only previously addressed using quantization, a highly complex process. NeuPerm is shown to work on LLMs, a feat that no other previous similar works have achieved. The source code is available at https://github.com/danigil/NeuPerm.git.

Key Contributions

NeuPerm: a permutation-symmetry-based sanitization method that disrupts steganographic malware hidden in neural network weights with negligible model performance impact
First defense shown to defeat error-correcting steganography (MaleficNet) without the performance degradation of quantization-based approaches
Demonstrated effectiveness on LLMs, where prior disruption methods had not been validated

🛡️ Threat Analysis

AI Supply Chain Attacks

The core threat is steganographic malware hidden in pre-trained model weights and distributed via PTM hubs (HuggingFace, TensorFlow Hub) — a canonical ML supply chain attack. NeuPerm is a defense that disrupts these embedded payloads before the model reaches end-users. The paper explicitly cites ML supply chain compromise as the threat vector and references OWASP LLM Top 10 and MITRE ATLAS in that context.