defense 2025

Robust Watermarking on Gradient Boosting Decision Trees

Jun Woo Chung 1, Yingjie Lao 2, Weijie Zhao 1

0 citations · arXiv

α

Published on arXiv

2511.09822

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Proposed methods achieve high watermark embedding rates with low accuracy degradation and strong robustness against adversarial fine-tuning attempts on GBDT models

GBDT In-Place Watermarking

Novel technique introduced


Gradient Boosting Decision Trees (GBDTs) are widely used in industry and academia for their high accuracy and efficiency, particularly on structured data. However, watermarking GBDT models remains underexplored compared to neural networks. In this work, we present the first robust watermarking framework tailored to GBDT models, utilizing in-place fine-tuning to embed imperceptible and resilient watermarks. We propose four embedding strategies, each designed to minimize impact on model accuracy while ensuring watermark robustness. Through experiments across diverse datasets, we demonstrate that our methods achieve high watermark embedding rates, low accuracy degradation, and strong resistance to post-deployment fine-tuning.


Key Contributions

  • First robust watermarking framework for GBDT models using in-place fine-tuning to embed ownership marks without retraining from scratch
  • Four embedding strategies (Wrong Prediction Flip, Outlier Flip, Cluster Center Flip, Confidence Flip) that minimize accuracy degradation while achieving high watermark embedding rates
  • Empirical demonstration that watermarks survive post-deployment fine-tuning across diverse tabular datasets

🛡️ Threat Analysis

Model Theft

Watermarks are embedded INSIDE the GBDT model structure (forcing specific predictions on selected inputs) to prove model ownership and detect unauthorized use — this is model IP protection via ownership watermarking, not content provenance. The paper explicitly frames this as defending against tampering and safeguarding intellectual property rights.


Details

Domains
tabular
Model Types
traditional_ml
Threat Tags
training_time
Applications
tabular classificationmodel ip protection