defense 2025

Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking

Sangeeth B 1, Serena Nicolazzo 2, Deepa K. 1, Vinod P. 1

0 citations · 18 references · arXiv

α

Published on arXiv

2512.16658

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Watermark remains detectable after fine-tuning with negligible model accuracy loss, and activation-based classifiers successfully distinguish original, watermarked, and tampered models.

Chaos-Based White-Box Watermarking

Novel technique introduced


The rapid proliferation of deep neural networks (DNNs) across several domains has led to increasing concerns regarding intellectual property (IP) protection and model misuse. Trained DNNs represent valuable assets, often developed through significant investments. However, the ease with which models can be copied, redistributed, or repurposed highlights the urgent need for effective mechanisms to assert and verify model ownership. In this work, we propose an efficient and resilient white-box watermarking framework that embeds ownership information into the internal parameters of a DNN using chaotic sequences. The watermark is generated using a logistic map, a well-known chaotic function, producing a sequence that is sensitive to its initialization parameters. This sequence is injected into the weights of a chosen intermediate layer without requiring structural modifications to the model or degradation in predictive performance. To validate ownership, we introduce a verification process based on a genetic algorithm that recovers the original chaotic parameters by optimizing the similarity between the extracted and regenerated sequences. The effectiveness of the proposed approach is demonstrated through extensive experiments on image classification tasks using MNIST and CIFAR-10 datasets. The results show that the embedded watermark remains detectable after fine-tuning, with negligible loss in model accuracy. In addition to numerical recovery of the watermark, we perform visual analyses using weight density plots and construct activation-based classifiers to distinguish between original, watermarked, and tampered models. Overall, the proposed method offers a flexible and scalable solution for embedding and verifying model ownership in white-box settings well-suited for real-world scenarios where IP protection is critical.


Key Contributions

  • Chaos-based watermark generation using a logistic map embedded into intermediate DNN layer weights without structural model modification or accuracy degradation
  • Genetic algorithm-based ownership verification that recovers original chaotic initialization parameters by maximizing similarity between extracted and regenerated sequences
  • Activation-based classifiers and weight density plots as auxiliary analysis tools to distinguish original, watermarked, and tampered models

🛡️ Threat Analysis

Model Theft

Watermark is embedded directly into model weight parameters (not outputs) to prove model ownership and detect unauthorized copying or redistribution — a classic model theft defense. Verification is performed by inspecting internal weight structure in a white-box setting.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_time
Datasets
MNISTCIFAR-10
Applications
image classificationmodel ip protection