defense 2026

Towards Building Non-Fine-Tunable Foundation Models

Ziyao Wang 1, Nizhang Li 2, Pingzhi Li 3, Guoheng Sun 1, Tianlong Chen 3, Ang Li 1

0 citations · 26 references · arXiv

α

Published on arXiv

2602.00446

Transfer Learning Attack

OWASP ML Top 10 — ML07

Model Theft

OWASP ML Top 10 — ML05

Key Finding

PMP preserves base LLM performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with resistance strength controlled by the mask sparsity ratio.

Private Mask Pre-Training (PMP)

Novel technique introduced


Open-sourcing foundation models (FMs) enables broad reuse but also exposes model trainers to economic and safety risks from unrestricted downstream fine-tuning. We address this problem by building non-fine-tunable foundation models: models that remain broadly usable in their released form while yielding limited adaptation gains under task-agnostic unauthorized fine-tuning. We propose Private Mask Pre-Training (PMP), a pre-training framework that concentrates representation learning into a sparse subnetwork identified early in training. The binary mask defining this subnetwork is kept private, and only the final dense weights are released. This forces unauthorized fine-tuning without access to the mask to update parameters misaligned with pretraining subspace, inducing an intrinsic mismatch between the fine-tuning objective and the pre-training geometry. We provide theoretical analysis showing that this mismatch destabilizes gradient-based adaptation and bounds fine-tuning gains. Empirical results on large language models demonstrating that PMP preserves base model performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with the strength of non-fine-tunability controlled by the mask ratio.


Key Contributions

  • Private Mask Pre-Training (PMP): concentrates representation learning into a sparse subnetwork whose binary mask is kept secret, releasing only the dense weights to frustrate unauthorized fine-tuning
  • Theoretical analysis showing that fine-tuning without the mask induces a geometric mismatch with the pre-training subspace that destabilizes gradient-based adaptation and bounds fine-tuning gains
  • Empirical demonstration on LLMs that PMP preserves base model capability while consistently degrading unauthorized fine-tuning across diverse downstream tasks, with non-fine-tunability strength tunable via mask ratio

🛡️ Threat Analysis

Model Theft

The paper's dual motivation includes protecting the economic IP value of open-source models from unauthorized adaptation — analogous to anti-distillation/anti-cloning defenses. PMP functions as a structural deterrent against value extraction through unauthorized downstream specialization, a form of model IP protection.

Transfer Learning Attack

PMP directly defends the transfer learning/fine-tuning process: it makes gradient-based adaptation ineffective for parties lacking the private subnetwork mask, by exploiting a geometric mismatch between fine-tuning updates and the pre-training subspace. The threat model is explicitly unauthorized fine-tuning (LoRA, full fine-tuning) of a released model.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxtraining_time
Applications
open-source foundation model protectionlanguage model pre-trainingunauthorized fine-tuning prevention