Towards Building Non-Fine-Tunable Foundation Models
Ziyao Wang 1, Nizhang Li 2, Pingzhi Li 3, Guoheng Sun 1, Tianlong Chen 3, Ang Li 1
Published on arXiv
2602.00446
Transfer Learning Attack
OWASP ML Top 10 — ML07
Model Theft
OWASP ML Top 10 — ML05
Key Finding
PMP preserves base LLM performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with resistance strength controlled by the mask sparsity ratio.
Private Mask Pre-Training (PMP)
Novel technique introduced
Open-sourcing foundation models (FMs) enables broad reuse but also exposes model trainers to economic and safety risks from unrestricted downstream fine-tuning. We address this problem by building non-fine-tunable foundation models: models that remain broadly usable in their released form while yielding limited adaptation gains under task-agnostic unauthorized fine-tuning. We propose Private Mask Pre-Training (PMP), a pre-training framework that concentrates representation learning into a sparse subnetwork identified early in training. The binary mask defining this subnetwork is kept private, and only the final dense weights are released. This forces unauthorized fine-tuning without access to the mask to update parameters misaligned with pretraining subspace, inducing an intrinsic mismatch between the fine-tuning objective and the pre-training geometry. We provide theoretical analysis showing that this mismatch destabilizes gradient-based adaptation and bounds fine-tuning gains. Empirical results on large language models demonstrating that PMP preserves base model performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with the strength of non-fine-tunability controlled by the mask ratio.
Key Contributions
- Private Mask Pre-Training (PMP): concentrates representation learning into a sparse subnetwork whose binary mask is kept secret, releasing only the dense weights to frustrate unauthorized fine-tuning
- Theoretical analysis showing that fine-tuning without the mask induces a geometric mismatch with the pre-training subspace that destabilizes gradient-based adaptation and bounds fine-tuning gains
- Empirical demonstration on LLMs that PMP preserves base model capability while consistently degrading unauthorized fine-tuning across diverse downstream tasks, with non-fine-tunability strength tunable via mask ratio
🛡️ Threat Analysis
The paper's dual motivation includes protecting the economic IP value of open-source models from unauthorized adaptation — analogous to anti-distillation/anti-cloning defenses. PMP functions as a structural deterrent against value extraction through unauthorized downstream specialization, a form of model IP protection.
PMP directly defends the transfer learning/fine-tuning process: it makes gradient-based adaptation ineffective for parties lacking the private subnetwork mask, by exploiting a geometric mismatch between fine-tuning updates and the pre-training subspace. The threat model is explicitly unauthorized fine-tuning (LoRA, full fine-tuning) of a released model.