attack 2025

Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries

Alexander Beiser ¹, Flavio Martinelli ², Wulfram Gerstner ², Johanni Brea ²

¹ TU Wien

² EPFL

0 citations · 29 references · arXiv

Published on arXiv

2511.20312

Model Theft

OWASP ML Top 10 — ML05

Key Finding

New augmentations enable recovery of one-hidden-layer network weights containing up to 100× more parameters than training data points, substantially extending prior state-of-the-art (limited to ~256-neuron teachers).

biased-noise and grid composition augmentations (within Expand-and-Cluster)

Novel technique introduced

Network weights can be reverse-engineered given enough informative samples of a network's input-output function. In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping. A sensible choice of queries is the dataset the teacher is trained on. But current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher. In this work, we explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers. We discover that standard augmentations such as rotation, flipping, and adding noise, bring little to no improvement to the identification problem. We design new data augmentation techniques tailored to better sample the representational space of the network's hidden layers. With our augmentations we extend the state-of-the-art range of recoverable network sizes. To test their scalability, we show that we can recover networks of up to 100 times more parameters than training data-points.

Key Contributions

Identifies query insufficiency as a critical bottleneck when teachers are overparameterized relative to training data, causing student overfitting instead of weight alignment.
Proposes two novel augmentation strategies ('biased-noise' and 'grid composition') tailored to elicit diverse hidden-layer activations from the teacher, unlike standard augmentations (rotation, flip, noise) which yield little improvement.
Extends the Expand-and-Cluster weight recovery framework to teachers with up to 100× more parameters than available training data points.

🛡️ Threat Analysis

Model Theft

The paper's primary contribution is improving model extraction: querying a target (teacher) model's input-output function to reconstruct its exact weight parameters. The two new augmentation methods (biased-noise and grid composition) are attack-side improvements that extend the scalability of weight recovery, directly advancing model theft capability.

Details

Domains

vision

Model Types

cnntraditional_ml

Threat Tags

black_boxinference_time

Applications

neural network weight extractionmodel reverse engineering

Read PDF arXiv DOI Code

Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Delving into Cryptanalytic Extraction of PReLU Neural Networks

DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors

Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?

Application-Specific Power Side-Channel Attacks and Countermeasures: A Survey

Stabilizing Data-Free Model Extraction

DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks

Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation

MER-Inspector: Assessing model extraction risks from an attack-agnostic perspective