ActiveMark: on watermarking of visual foundation models via massive activations
Anna Chistyakova , Mikhail Pautov
Published on arXiv
2510.04966
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Proposed watermarks embedded in VFM internal representations remain detectable in functional copies obtained by fine-tuning, with provably low false detection and misdetection probabilities across multiple VFM architectures.
ActiveMark
Novel technique introduced
Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.
Key Contributions
- First watermarking method specifically targeting visual foundation models (VFMs), exploiting massive activations to identify expressive blocks suitable for watermark embedding
- Encoder-decoder architecture that embeds binary watermarks into internal representations of a hold-out image set, with watermarks surviving downstream fine-tuning for classification and segmentation tasks
- Theoretical upper bounds on false positive (non-watermarked model flagged) and false negative (watermarked functional copy missed) detection error probabilities
🛡️ Threat Analysis
ActiveMark embeds ownership watermarks INTO THE MODEL's internal representations (hidden activations of expressive blocks), enabling ownership verification when a functional copy is redistributed. The watermark survives downstream fine-tuning, allowing differentiation between an authorized copy and an independently trained model. This is model IP protection via model-embedded watermarking — not output content watermarking.