Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
Alexander Beiser 1, Flavio Martinelli 2, Wulfram Gerstner 2, Johanni Brea 2
Published on arXiv
2511.20312
Model Theft
OWASP ML Top 10 — ML05
Key Finding
New augmentations enable recovery of one-hidden-layer network weights containing up to 100× more parameters than training data points, substantially extending prior state-of-the-art (limited to ~256-neuron teachers).
biased-noise and grid composition augmentations (within Expand-and-Cluster)
Novel technique introduced
Network weights can be reverse-engineered given enough informative samples of a network's input-output function. In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping. A sensible choice of queries is the dataset the teacher is trained on. But current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher. In this work, we explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers. We discover that standard augmentations such as rotation, flipping, and adding noise, bring little to no improvement to the identification problem. We design new data augmentation techniques tailored to better sample the representational space of the network's hidden layers. With our augmentations we extend the state-of-the-art range of recoverable network sizes. To test their scalability, we show that we can recover networks of up to 100 times more parameters than training data-points.
Key Contributions
- Identifies query insufficiency as a critical bottleneck when teachers are overparameterized relative to training data, causing student overfitting instead of weight alignment.
- Proposes two novel augmentation strategies ('biased-noise' and 'grid composition') tailored to elicit diverse hidden-layer activations from the teacher, unlike standard augmentations (rotation, flip, noise) which yield little improvement.
- Extends the Expand-and-Cluster weight recovery framework to teachers with up to 100× more parameters than available training data points.
🛡️ Threat Analysis
The paper's primary contribution is improving model extraction: querying a target (teacher) model's input-output function to reconstruct its exact weight parameters. The two new augmentation methods (biased-noise and grid composition) are attack-side improvements that extend the scalability of weight recovery, directly advancing model theft capability.