defense 2025

Knowledge Distillation Detection for Open-weights Models

Qin Shi , Amber Yijia Zheng , Qifan Song , Raymond A. Yeh

1 citations · 65 references · arXiv

α

Published on arXiv

2510.02302

Model Theft

OWASP ML Top 10 — ML05

Key Finding

The proposed data-free distillation detection framework outperforms the strongest baselines by up to 71.2% on ImageNet across diverse architectures and generative models.


We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are available. This problem is motivated by growing concerns about model provenance and unauthorized replication through distillation. To address this task, we introduce a model-agnostic framework that combines data-free input synthesis and statistical score computation for detecting distillation. Our approach is applicable to both classification and generative models. Experiments on diverse architectures for image classification and text-to-image generation show that our method improves detection accuracy over the strongest baselines by 59.6% on CIFAR-10, 71.2% on ImageNet, and 20.0% for text-to-image generation. The code is available at https://github.com/shqii1j/distillation_detection.


Key Contributions

  • Introduces the novel task of knowledge distillation detection: determining whether a student model was distilled from a specific teacher using only the student's weights and teacher's API
  • Proposes a model-agnostic, data-free framework combining synthetic input generation with statistical scoring to detect distillation across classification and generative models
  • Demonstrates substantial improvements over strongest baselines: +59.6% on CIFAR-10, +71.2% on ImageNet, +20.0% for text-to-image generation

🛡️ Threat Analysis

Model Theft

The paper defends against model theft by detecting whether a student model was illegitimately cloned from a teacher via distillation. This is model fingerprinting/provenance — determining if a model's intellectual property was stolen through distillation — which falls squarely under ML05 model theft defense.


Details

Domains
visiongenerative
Model Types
cnntransformerdiffusion
Threat Tags
black_boxinference_time
Datasets
CIFAR-10ImageNet
Applications
image classificationtext-to-image generationmodel provenance verification