α

Published on arXiv

2603.18793

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Achieves superior detection accuracy and statistical verifiability under fine-tuning, quantization, pruning, and knowledge distillation attacks, outperforming existing SOTA methods

Functional Subspace Watermarking (FSW)

Novel technique introduced


Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.


Key Contributions

  • Functional subspace extraction via generalized eigenvalue problem for stable watermark anchoring
  • Adaptive spectral truncation strategy balancing robustness and utility
  • Vector consistency constraint preserving semantic performance during watermark injection

🛡️ Threat Analysis

Model Theft

Watermark is embedded IN THE MODEL WEIGHTS (internal representations/functional subspace) to prove ownership of the LLM itself — this is model IP protection, defending against model theft through fine-tuning, distillation, quantization, and redistribution.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
training_time
Applications
model ownership protectionllm intellectual property defense