DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
Asmita Mohanty 1, Gezheng Kang 2, Lei Gao 1, Murali Annavaram 1
Published on arXiv
2510.16716
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
DistilLock prevents unauthorized knowledge distillation and resists model-stealing attacks on edge devices while incurring only minimal computation overhead via lightweight TEE authorization.
DistilLock
Novel technique introduced
Large Language Models (LLMs) have demonstrated strong performance across diverse tasks, but fine-tuning them typically relies on cloud-based, centralized infrastructures. This requires data owners to upload potentially sensitive data to external servers, raising serious privacy concerns. An alternative approach is to fine-tune LLMs directly on edge devices using local data; however, this introduces a new challenge: the model owner must transfer proprietary models to the edge, which risks intellectual property (IP) leakage. To address this dilemma, we propose DistilLock, a TEE-assisted fine-tuning framework that enables privacy-preserving knowledge distillation on the edge. In DistilLock, a proprietary foundation model is executed within a trusted execution environment (TEE) enclave on the data owner's device, acting as a secure black-box teacher. This setup preserves both data privacy and model IP by preventing direct access to model internals. Furthermore, DistilLock employs a model obfuscation mechanism to offload obfuscated weights to untrusted accelerators for efficient knowledge distillation without compromising security. We demonstrate that DistilLock prevents unauthorized knowledge distillation processes and model-stealing attacks while maintaining high computational efficiency, but offering a secure and practical solution for edge-based LLM personalization.
Key Contributions
- TEE-enclave execution of proprietary LLM as a secure black-box teacher, preventing direct weight access during on-device knowledge distillation
- Model obfuscation mechanism that offloads compute-heavy operations to untrusted GPU accelerators while keeping obfuscated weights non-reverse-engineerable
- Demonstrated resistance to model-stealing attacks including surrogate training on the obfuscated model, with minimal computational overhead
🛡️ Threat Analysis
DistilLock explicitly defends against model-stealing attacks and unauthorized knowledge distillation, which is a form of model extraction — protecting the proprietary LLM's weights and learned functionality (IP) from being cloned by an adversary with white-box access to the edge device.