Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
Zihan Wang 1,2, Zhiyong Ma 1, Zhongkui Ma 1, Shuofeng Liu 1, Akide Liu 3, Derui Wang 2, Minhui Xue 2, Guangdong Bai 1
Published on arXiv
2510.10982
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Under extreme 0 dB PSNR distortion, the authorized ResNet-50 drops only 0.2% (80.3→80.1%) while all unauthorized models collapse to ≈0.1% top-1 accuracy on ImageNet.
Non-Transferable Examples (NEs)
Novel technique introduced
Recent AI regulations call for data that remain useful for innovation while resistant to misuse, balancing utility with protection at the model level. Existing approaches either perturb data to make it unlearnable or retrain models to suppress transfer, but neither governs inference by unknown models, and both typically require control over training. We propose non-transferable examples (NEs), a training-free and data-agnostic input-side usage-control mechanism. We recode inputs within a model-specific low-sensitivity subspace, preserving outputs for the authorized model while reducing performance on unauthorized models through subspace misalignment. We establish formal bounds that guarantee utility for the authorized model and quantify deviation for unauthorized ones, with the Hoffman-Wielandt inequality linking degradation to spectral differences. Empirically, NEs retain performance on diverse vision backbones and state-of-the-art vision-language models under common preprocessing, whereas non-target models collapse even with reconstruction attempts. These results establish NEs as a practical means to preserve intended data utility while preventing unauthorized exploitation. Our project is available at https://trusted-system-lab.github.io/model-specificity
Key Contributions
- Introduces Non-Transferable Examples (NEs), a training-free, data-agnostic mechanism that recodes inputs within a model-specific low-sensitivity subspace so only the authorized model retains usable performance.
- Establishes formal guarantees via matrix perturbation theory and the Hoffman-Wielandt inequality, bounding authorized-utility retention and quantifying unauthorized-model degradation as a function of spectral differences between models.
- Empirically demonstrates that NEs reduce unauthorized vision backbone and VLM performance to near-zero (≈0.1% top-1 on ImageNet) even under reconstruction attempts, while leaving the authorized model's accuracy virtually unchanged.
🛡️ Threat Analysis
Non-transferable examples function as a data-side IP protection mechanism enforcing model-specific authorization: data remains exploitable only by the licensed model (f*) and collapses on all unauthorized models. This is analogous to anti-distillation techniques in ML05 — preventing unauthorized models from extracting utility from protected data — and addresses the same MLaaS IP protection threat model (unauthorized scraping and reuse of data for unlicensed inference).