defense 2026

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

Minwoo Jang , Hoyoung Kim , Jabin Koo , Jungseul Ok

0 citations · 37 references · arXiv

α

Published on arXiv

2601.21898

Model Theft

OWASP ML Top 10 — ML05

Key Finding

TRAP² degrades merged model performance while preserving standalone utility, providing architecture-agnostic protection against unauthorized model recomposition on model hubs.

TRAP²

Novel technique introduced


The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.


Key Contributions

  • Identifies a governance gap where model merging enables downstream users to recompose released weights into unauthorized mixtures that bypass safety alignment or licensing
  • Proposes TRAP², an architecture-agnostic protection framework that encodes scaling-sensitive degradation into model weights during fine-tuning, rendering merged models ineffective while keeping standalone performance intact
  • Uses weight re-scaling as a proxy for the merging process, enabling protection without dependence on specific architecture or release format (adapters or full models)

🛡️ Threat Analysis

Model Theft

TRAP² protects model intellectual property and licensing terms against unauthorized reuse via merging — analogous to anti-distillation techniques. The attacker acquires legitimately released weights and recomposes them into unauthorized mixtures, which is a form of model IP misappropriation; the defense encodes protection into the weights themselves during fine-tuning to prevent this.


Details

Domains
nlpvision
Model Types
transformerllm
Threat Tags
training_timewhite_box
Applications
model mergingmodel ip protectionsafety alignment preservationmodel licensing enforcement