defense 2025

Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry

Birk Torpmann-Hagen 1,2, Michael A. Riegler 2, Pål Halvorsen 2, Dag Johansen 1

0 citations · 21 references · arXiv

α

Published on arXiv

2509.20399

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Key Finding

Shuffling the column order of weight matrices effectively corrupts stegomalware payloads embedded by state-of-the-art methods at no cost to model accuracy, outperforming pruning and retraining defenses by a significant margin.

Weight Permutation

Novel technique introduced


Deep neural networks are being utilized in a growing number of applications, both in production systems and for personal use. Network checkpoints are as a consequence often shared and distributed on various platforms to ease the development process. This work considers the threat of neural network stegomalware, where malware is embedded in neural network checkpoints at a negligible cost to network accuracy. This constitutes a significant security concern, but is nevertheless largely neglected by the deep learning practitioners and security specialists alike. We propose the first effective countermeasure to these attacks. In particular, we show that state-of-the-art neural network stegomalware can be efficiently and effectively neutralized through shuffling the column order of the weight- and bias-matrices, or equivalently the channel-order of convolutional layers. We show that this effectively corrupts payloads that have been embedded by state-of-the-art methods in neural network steganography at no cost to network accuracy, outperforming competing methods by a significant margin. We then discuss possible means by which to bypass this defense, additional defense methods, and advocate for continued research into the security of machine learning systems.


Key Contributions

  • Weight permutation defense that exploits permutation symmetry of neural network layers (shuffling column order of weight/bias matrices or channel order of conv layers) to corrupt embedded stegomalware payloads
  • Mathematical proof that weight permutation preserves functional equality of the network at zero cost to model accuracy
  • Empirical comparison showing weight permutation outperforms retraining and pruning against state-of-the-art stegomalware (e.g., MaleficNet) including error-correcting-code-hardened variants

🛡️ Threat Analysis

AI Supply Chain Attacks

The threat model is explicitly about compromised pre-trained models distributed via HuggingFace, PyTorch Hub, and similar model hubs — a textbook AI supply chain attack. The defense neutralizes malicious payloads embedded in model checkpoints before victims load them, targeting the distribution phase of the ML supply chain.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timedigital
Datasets
ImageNet
Applications
pre-trained model distributionneural network checkpoints