defense 2025

Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

Zihao Wang 1, Enneng Yang 1, Lu Yin 2, Shiwei Liu 3, Li Shen 1

0 citations

α

Published on arXiv

2509.01548

Model Theft

OWASP ML Top 10 — ML05

Key Finding

MergeLock degrades merged model performance by over 95% in most tested scenarios while keeping the protected model's own outputs unchanged, and protected models resist low-cost recovery attacks after merging.

MergeLock

Novel technique introduced


Model merging leverages multiple finetuned expert models to construct a multi-task model with low cost, and is gaining increasing attention. However, as a growing number of finetuned models become publicly available, concerns about the safety of model merging have emerged. Unauthorized merging may infringe on developers' rights and risk leaking sensitive personal information. Most existing methods focus on detecting whether a merged model originates from a specific source model, but fail to effectively prevent illegal merging. In this paper, we propose MergeLock, an active protection mechanism that disrupts model parameters to render them unmergeable, thereby directly preventing unauthorized model merging. Specifically, leveraging the inherent symmetry of the attention mechanism in Transformer-based models, we randomly sample two pairs of invertible matrices and apply them to the Query-Key (QK) and Value-Output (VO) branches. This transformation keeps the model's output unchanged while pushing it away from the shared parameter space of other finetuned models. Extensive experiments across both vision and language tasks demonstrate that MergeLock can degrade the performance of merged models by over 95% when a protected model is involved in most cases, demonstrating its effectiveness. Moreover, we further demonstrate that merged models protected by MergeLock cannot be effectively recovered using low-cost restoration methods, further enhancing robustness against unauthorized merging. The code is available at https://github.com/hetailang/Merge-Lock.


Key Contributions

  • MergeLock: an active protection mechanism that applies invertible matrix transformations to QK and VO attention branches, preserving model functionality while pushing weights out of the shared parameter space used by model merging algorithms
  • Demonstrates over 95% performance degradation in merged models when a protected model is involved across vision and language tasks
  • Shows robustness against low-cost restoration/recovery attempts on MergeLock-protected merged models

🛡️ Threat Analysis

Model Theft

Model merging is an unauthorized exploitation vector analogous to knowledge distillation-as-theft — adversaries use publicly shared fine-tuned weights to construct new multi-task models without permission. MergeLock is an anti-exploitation defense (like anti-distillation) that protects model IP by making weights incompatible with merging, fitting ML05's scope of protecting model intellectual property from unauthorized use.


Details

Domains
visionnlp
Model Types
transformer
Threat Tags
training_timewhite_box
Datasets
CIFAR-10CIFAR-100ImageNetViT benchmarksNLP task benchmarks
Applications
model ip protectionfine-tuned model sharingmulti-task model construction