Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

Model merging leverages multiple finetuned expert models to construct a multi-task model with low cost, and is gaining increasing attention. However, as a growing number of finetuned models become publicly available, concerns about the safety of model merging have emerged. Unauthorized merging may infringe on developers' rights and risk leaking sensitive personal information. Most existing methods focus on detecting whether a merged model originates from a specific source model, but fail to effectively prevent illegal merging. In this paper, we propose MergeLock, an active protection mechanism that disrupts model parameters to render them unmergeable, thereby directly preventing unauthorized model merging. Specifically, leveraging the inherent symmetry of the attention mechanism in Transformer-based models, we randomly sample two pairs of invertible matrices and apply them to the Query-Key (QK) and Value-Output (VO) branches. This transformation keeps the model's output unchanged while pushing it away from the shared parameter space of other finetuned models. Extensive experiments across both vision and language tasks demonstrate that MergeLock can degrade the performance of merged models by over 95% when a protected model is involved in most cases, demonstrating its effectiveness. Moreover, we further demonstrate that merged models protected by MergeLock cannot be effectively recovered using low-cost restoration methods, further enhancing robustness against unauthorized merging. The code is available at https://github.com/hetailang/Merge-Lock.

Key Contributions

MergeLock: an active protection mechanism that applies invertible matrix transformations to QK and VO attention branches, preserving model functionality while pushing weights out of the shared parameter space used by model merging algorithms
Demonstrates over 95% performance degradation in merged models when a protected model is involved across vision and language tasks
Shows robustness against low-cost restoration/recovery attempts on MergeLock-protected merged models

🛡️ Threat Analysis

Model Theft

Model merging is an unauthorized exploitation vector analogous to knowledge distillation-as-theft — adversaries use publicly shared fine-tuned weights to construct new multi-task models without permission. MergeLock is an anti-exploitation defense (like anti-distillation) that protects model IP by making weights incompatible with merging, fitting ML05's scope of protecting model intellectual property from unauthorized use.

Details

Domains

visionnlp

Model Types

transformer

Threat Tags

training_timewhite_box

Datasets

CIFAR-10CIFAR-100ImageNetViT benchmarksNLP task benchmarks

Applications

2026 0 cit.

Model Theft

67%