ActiveMark: on watermarking of visual foundation models via massive activations

Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.

Key Contributions

First watermarking method specifically targeting visual foundation models (VFMs), exploiting massive activations to identify expressive blocks suitable for watermark embedding
Encoder-decoder architecture that embeds binary watermarks into internal representations of a hold-out image set, with watermarks surviving downstream fine-tuning for classification and segmentation tasks
Theoretical upper bounds on false positive (non-watermarked model flagged) and false negative (watermarked functional copy missed) detection error probabilities

🛡️ Threat Analysis

Model Theft

ActiveMark embeds ownership watermarks INTO THE MODEL's internal representations (hidden activations of expressive blocks), enabling ownership verification when a functional copy is redistributed. The watermark survives downstream fine-tuning, allowing differentiation between an authorized copy and an independently trained model. This is model IP protection via model-embedded watermarking — not output content watermarking.

Details

Domains

vision

Model Types

transformervlm

Threat Tags

training_timeblack_box

Applications

2026 0 cit.

Model Theft

69%