defense 2025

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

Zihan Wang ^1,2, Zhiyong Ma ¹, Zhongkui Ma ¹, Shuofeng Liu ¹, Akide Liu ³, Derui Wang ², Minhui Xue ², Guangdong Bai ¹

¹ The University of Queensland

² CSIRO’s Data61

³ Monash University

3 citations · 51 references · arXiv

Published on arXiv

2510.10982

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Under extreme 0 dB PSNR distortion, the authorized ResNet-50 drops only 0.2% (80.3→80.1%) while all unauthorized models collapse to ≈0.1% top-1 accuracy on ImageNet.

Non-Transferable Examples (NEs)

Novel technique introduced

Recent AI regulations call for data that remain useful for innovation while resistant to misuse, balancing utility with protection at the model level. Existing approaches either perturb data to make it unlearnable or retrain models to suppress transfer, but neither governs inference by unknown models, and both typically require control over training. We propose non-transferable examples (NEs), a training-free and data-agnostic input-side usage-control mechanism. We recode inputs within a model-specific low-sensitivity subspace, preserving outputs for the authorized model while reducing performance on unauthorized models through subspace misalignment. We establish formal bounds that guarantee utility for the authorized model and quantify deviation for unauthorized ones, with the Hoffman-Wielandt inequality linking degradation to spectral differences. Empirically, NEs retain performance on diverse vision backbones and state-of-the-art vision-language models under common preprocessing, whereas non-target models collapse even with reconstruction attempts. These results establish NEs as a practical means to preserve intended data utility while preventing unauthorized exploitation. Our project is available at https://trusted-system-lab.github.io/model-specificity

Key Contributions

Introduces Non-Transferable Examples (NEs), a training-free, data-agnostic mechanism that recodes inputs within a model-specific low-sensitivity subspace so only the authorized model retains usable performance.
Establishes formal guarantees via matrix perturbation theory and the Hoffman-Wielandt inequality, bounding authorized-utility retention and quantifying unauthorized-model degradation as a function of spectral differences between models.
Empirically demonstrates that NEs reduce unauthorized vision backbone and VLM performance to near-zero (≈0.1% top-1 on ImageNet) even under reconstruction attempts, while leaving the authorized model's accuracy virtually unchanged.

🛡️ Threat Analysis

Model Theft

Non-transferable examples function as a data-side IP protection mechanism enforcing model-specific authorization: data remains exploitable only by the licensed model (f*) and collapses on all unauthorized models. This is analogous to anti-distillation techniques in ML05 — preventing unauthorized models from extracting utility from protected data — and addresses the same MLaaS IP protection threat model (unauthorized scraping and reuse of data for unlicensed inference).

Details

Domains

visionmultimodal

Model Types

cnntransformervlm

Threat Tags

white_boxinference_time

Datasets

ImageNetMMBench

Applications

image classificationvision-language modelsmachine learning as a service (mlaas)data licensing and access control

Read PDF arXiv DOI Code

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

Amulet: Fast TEE-Shielded Inference for On-Device Model Protection

Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation

Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation