defense 2025

ActiveMark: on watermarking of visual foundation models via massive activations

Anna Chistyakova , Mikhail Pautov

0 citations · 25 references · arXiv

α

Published on arXiv

2510.04966

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Proposed watermarks embedded in VFM internal representations remain detectable in functional copies obtained by fine-tuning, with provably low false detection and misdetection probabilities across multiple VFM architectures.

ActiveMark

Novel technique introduced


Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.


Key Contributions

  • First watermarking method specifically targeting visual foundation models (VFMs), exploiting massive activations to identify expressive blocks suitable for watermark embedding
  • Encoder-decoder architecture that embeds binary watermarks into internal representations of a hold-out image set, with watermarks surviving downstream fine-tuning for classification and segmentation tasks
  • Theoretical upper bounds on false positive (non-watermarked model flagged) and false negative (watermarked functional copy missed) detection error probabilities

🛡️ Threat Analysis

Model Theft

ActiveMark embeds ownership watermarks INTO THE MODEL's internal representations (hidden activations of expressive blocks), enabling ownership verification when a functional copy is redistributed. The watermark survives downstream fine-tuning, allowing differentiation between an authorized copy and an independently trained model. This is model IP protection via model-embedded watermarking — not output content watermarking.


Details

Domains
vision
Model Types
transformervlm
Threat Tags
training_timeblack_box
Applications
visual foundation modelsimage classificationimage segmentationmodel ip protection