defense 2025

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

1 citations · 34 references · arXiv

Published on arXiv

2511.10712

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

MergeBarrier disrupts Linear Mode Connectivity to proactively prevent unauthorized model merging while maintaining negligible accuracy loss on the protected model.

MergeBarrier

Novel technique introduced

Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.

Key Contributions

Identifies three critical properties (proactivity, compatibility, security with utility) that existing defenses against model merging stealing fail to simultaneously satisfy
Proposes MergeBarrier, a plug-and-play defense that disrupts Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, eliminating the low-loss path needed for effective model merging
Demonstrates that MergeBarrier proactively prevents unauthorized model merging with negligible accuracy loss in an open-source-compatible setting

🛡️ Threat Analysis

Model Theft

The paper's primary contribution is defending against 'model merging stealing' — a form of model IP theft where free-riders incorporate open-access proprietary models into their own via model merging. MergeBarrier is an IP protection mechanism that prevents unauthorized cloning/reuse of a model's learned functionality, directly analogous to model watermarking or anti-extraction defenses.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_timetargeted

Applications

llm ip protectionopen-source model licensing enforcement

Read PDF arXiv DOI

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

EditMark: Watermarking Large Language Models based on Model Editing

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Exploiting the Experts: Unauthorized Compression in MoE-LLMs

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing

SEAL: Subspace-Anchored Watermarks for LLM Ownership

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors