defense 2025

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

0 citations

Published on arXiv

2508.01784

Model Theft

OWASP ML Top 10 — ML05

Key Finding

RouteMark consistently yields high fingerprint similarity for reused experts and clear separation from unrelated experts, remaining robust across both structural and parametric tampering operations.

RouteMark

Novel technique introduced

Model merging via Mixture-of-Experts (MoE) has emerged as a scalable solution for consolidating multiple task-specific models into a unified sparse architecture, where each expert is derived from a model fine-tuned on a distinct task. While effective for multi-task integration, this paradigm introduces a critical yet underexplored challenge: how to attribute and protect the intellectual property (IP) of individual experts after merging. We propose RouteMark, a framework for IP protection in merged MoE models through the design of expert routing fingerprints. Our key insight is that task-specific experts exhibit stable and distinctive routing behaviors under probing inputs. To capture these patterns, we construct expert-level fingerprints using two complementary statistics: the Routing Score Fingerprint (RSF), quantifying the intensity of expert activation, and the Routing Preference Fingerprint (RPF), characterizing the input distribution that preferentially activates each expert. These fingerprints are reproducible, task-discriminative, and lightweight to construct. For attribution and tampering detection, we introduce a similarity-based matching algorithm that compares expert fingerprints between a suspect and a reference (victim) model. Extensive experiments across diverse tasks and CLIP-based MoE architectures show that RouteMark consistently yields high similarity for reused experts and clear separation from unrelated ones. Moreover, it remains robust against both structural tampering (expert replacement, addition, deletion) and parametric tampering (fine-tuning, pruning, permutation), outperforming weight- and activation-based baseliness. Our work lays the foundation for RouteMark as a practical and broadly applicable framework for IP verification in MoE-based model merging.

Key Contributions

First systematic formulation of IP attribution in MoE-based model merging, including a structured threat model covering structural and parametric tampering.
RouteMark: routing-based expert fingerprinting using two complementary statistics — Routing Score Fingerprint (RSF) for activation intensity and Routing Preference Fingerprint (RPF) for selection preference.
Demonstrated robustness against six tampering operations (fine-tuning, pruning, permutation, replacement, addition, deletion), outperforming weight- and activation-based baselines.

🛡️ Threat Analysis

Model Theft

RouteMark creates behavioral fingerprints embedded in the routing properties of expert modules (the MODEL itself) to prove ownership and detect unauthorized reuse — a direct defense against model theft and IP misappropriation in MoE architectures.

Details

Domains

visionmultimodal

Model Types

transformermultimodal

Threat Tags

grey_boxinference_time

Datasets

CLIP-based MoE benchmarks across diverse vision tasks

Applications

moe model mergingmodel ip attributionexpert module ownership verification

Read PDF arXiv

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation