Defending Unauthorized Model Merging via Dual-Stage Weight Protection
Wei-Jia Chen 1, Min-Yen Tsai 1, Cheng-Yi Lee 2, Chia-Mu Yu 1
Published on arXiv
2511.11851
Model Theft
OWASP ML Top 10 — ML05
Key Finding
MergeGuard reduces merged model accuracy by up to 90% while incurring less than 1.5% performance loss on the protected model across both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models.
MergeGuard
Novel technique introduced
The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.
Key Contributions
- Stage 1: L2-regularized optimization that redistributes task-relevant information evenly across layers to disrupt merging compatibility
- Stage 2: Structured perturbations that misalign task subspaces and break curvature compatibility in the loss landscape
- MergeGuard framework that reduces merged model accuracy by up to 90% with less than 1.5% degradation on the protected model, validated on ViT-L-14, Llama2, Gemma2, and Mistral
🛡️ Threat Analysis
MergeGuard defends against unauthorized extraction of model capabilities through merging — an IP theft scenario. The defense reshapes parameter geometry to prevent capability transfer, analogous to anti-distillation techniques listed under ML05. The paper explicitly frames this as protecting model ownership and IP rights.