defense 2025

TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone

Xunjie Wang , Jiacheng Shi , Zihan Zhao , Yang Yu , Zhichao Hua , Jinyu Gu

2 citations · 90 references · arXiv

α

Published on arXiv

2511.13717

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Reduces time-to-first-token (TTFT) by up to 90.9% and increases decoding speed by up to 23.2% compared to a strawman TEE baseline without the proposed optimizations

TZ-LLM

Novel technique introduced


Large Language Models (LLMs) deployed on mobile devices offer benefits like user privacy and reduced network latency, but introduce a significant security risk: the leakage of proprietary models to end users. To mitigate this risk, we propose a system design for protecting on-device LLMs using Arm Trusted Execution Environment (TEE), TrustZone. Our system addresses two primary challenges: (1) The dilemma between memory efficiency and fast inference (caching model parameters within TEE memory). (2) The lack of efficient and secure Neural Processing Unit (NPU) time-sharing between Rich Execution Environment (REE) and TEE. Our approach incorporates two key innovations. First, we employ pipelined restoration, leveraging the deterministic memory access patterns of LLM inference to prefetch parameters on demand, hiding memory allocation, I/O and decryption latency under computation time. Second, we introduce a co-driver design, creating a minimal data plane NPU driver in the TEE that collaborates with the full-fledged REE driver. This reduces the TEE TCB size and eliminates control plane reinitialization overhead during NPU world switches. We implemented our system on the emerging OpenHarmony OS and the llama.cpp inference framework, and evaluated it with various LLMs on an Arm Rockchip device. Compared to a strawman TEE baseline lacking our optimizations, our system reduces TTFT by up to 90.9% and increases decoding speed by up to 23.2%.


Key Contributions

  • TZ-LLM system design that keeps LLM model parameters encrypted inside Arm TrustZone TEE to prevent proprietary model theft by device end users
  • Pipelined restoration technique that prefetches and decrypts model parameters on demand, hiding I/O and decryption latency behind computation to minimize TTFT
  • Co-driver design creating a minimal TEE-side NPU data plane driver that collaborates with the full REE driver, reducing TCB size and eliminating NPU world-switch reinitialization overhead

🛡️ Threat Analysis

Model Theft

The paper explicitly targets model IP theft: end users could extract proprietary LLM weights from mobile devices. TZ-LLM keeps model parameters encrypted inside TEE, preventing REE-side access and protecting model intellectual property — a direct ML05 defense.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
llama.cpp benchmarks on Arm Rockchip device
Applications
on-device llm inferencemobile llm deployment