defense 2025

DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

Asmita Mohanty 1, Gezheng Kang 2, Lei Gao 1, Murali Annavaram 1

0 citations · 30 references · arXiv

α

Published on arXiv

2510.16716

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

DistilLock prevents unauthorized knowledge distillation and resists model-stealing attacks on edge devices while incurring only minimal computation overhead via lightweight TEE authorization.

DistilLock

Novel technique introduced


Large Language Models (LLMs) have demonstrated strong performance across diverse tasks, but fine-tuning them typically relies on cloud-based, centralized infrastructures. This requires data owners to upload potentially sensitive data to external servers, raising serious privacy concerns. An alternative approach is to fine-tune LLMs directly on edge devices using local data; however, this introduces a new challenge: the model owner must transfer proprietary models to the edge, which risks intellectual property (IP) leakage. To address this dilemma, we propose DistilLock, a TEE-assisted fine-tuning framework that enables privacy-preserving knowledge distillation on the edge. In DistilLock, a proprietary foundation model is executed within a trusted execution environment (TEE) enclave on the data owner's device, acting as a secure black-box teacher. This setup preserves both data privacy and model IP by preventing direct access to model internals. Furthermore, DistilLock employs a model obfuscation mechanism to offload obfuscated weights to untrusted accelerators for efficient knowledge distillation without compromising security. We demonstrate that DistilLock prevents unauthorized knowledge distillation processes and model-stealing attacks while maintaining high computational efficiency, but offering a secure and practical solution for edge-based LLM personalization.


Key Contributions

  • TEE-enclave execution of proprietary LLM as a secure black-box teacher, preventing direct weight access during on-device knowledge distillation
  • Model obfuscation mechanism that offloads compute-heavy operations to untrusted GPU accelerators while keeping obfuscated weights non-reverse-engineerable
  • Demonstrated resistance to model-stealing attacks including surrogate training on the obfuscated model, with minimal computational overhead

🛡️ Threat Analysis

Model Theft

DistilLock explicitly defends against model-stealing attacks and unauthorized knowledge distillation, which is a form of model extraction — protecting the proprietary LLM's weights and learned functionality (IP) from being cloned by an adversary with white-box access to the edge device.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxblack_boxinference_time
Applications
edge llm deploymenton-device llm fine-tuningmodel ip protection