DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

Large Language Models (LLMs) have demonstrated strong performance across diverse tasks, but fine-tuning them typically relies on cloud-based, centralized infrastructures. This requires data owners to upload potentially sensitive data to external servers, raising serious privacy concerns. An alternative approach is to fine-tune LLMs directly on edge devices using local data; however, this introduces a new challenge: the model owner must transfer proprietary models to the edge, which risks intellectual property (IP) leakage. To address this dilemma, we propose DistilLock, a TEE-assisted fine-tuning framework that enables privacy-preserving knowledge distillation on the edge. In DistilLock, a proprietary foundation model is executed within a trusted execution environment (TEE) enclave on the data owner's device, acting as a secure black-box teacher. This setup preserves both data privacy and model IP by preventing direct access to model internals. Furthermore, DistilLock employs a model obfuscation mechanism to offload obfuscated weights to untrusted accelerators for efficient knowledge distillation without compromising security. We demonstrate that DistilLock prevents unauthorized knowledge distillation processes and model-stealing attacks while maintaining high computational efficiency, but offering a secure and practical solution for edge-based LLM personalization.

Key Contributions

TEE-enclave execution of proprietary LLM as a secure black-box teacher, preventing direct weight access during on-device knowledge distillation
Model obfuscation mechanism that offloads compute-heavy operations to untrusted GPU accelerators while keeping obfuscated weights non-reverse-engineerable
Demonstrated resistance to model-stealing attacks including surrogate training on the obfuscated model, with minimal computational overhead

🛡️ Threat Analysis

Model Theft

DistilLock explicitly defends against model-stealing attacks and unauthorized knowledge distillation, which is a form of model extraction — protecting the proprietary LLM's weights and learned functionality (IP) from being cloned by an adversary with white-box access to the edge device.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxblack_boxinference_time

Applications

2025 0 cit.

Model Theft

93%