defense 2025

AttestLLM: Efficient Attestation Framework for Billion-scale On-device LLMs

Ruisi Zhang 1, Yifei Zhao 2, Neusha Javidnia 1, Mengxin Zheng 2, Farinaz Koushanfar 1

0 citations

α

Published on arXiv

2509.06326

Model Theft

OWASP ML Top 10 — ML05

Key Finding

AttestLLM achieves reliable model attestation on billion-parameter LLMs without compromising inference throughput, with demonstrated resilience against model replacement and forgery attacks

AttestLLM

Novel technique introduced


As on-device LLMs(e.g., Apple on-device Intelligence) are widely adopted to reduce network dependency, improve privacy, and enhance responsiveness, verifying the legitimacy of models running on local devices becomes critical. Existing attestation techniques are not suitable for billion-parameter Large Language Models (LLMs), struggling to remain both time- and memory-efficient while addressing emerging threats in the LLM era. In this paper, we present AttestLLM, the first-of-its-kind attestation framework to protect the hardware-level intellectual property (IP) of device vendors by ensuring that only authorized LLMs can execute on target platforms. AttestLLM leverages an algorithm/software/hardware co-design approach to embed robust watermarking signatures onto the activation distributions of LLM building blocks. It also optimizes the attestation protocol within the Trusted Execution Environment (TEE), providing efficient verification without compromising inference throughput. Extensive proof-of-concept evaluations on LLMs from Llama, Qwen, and Phi families for on-device use cases demonstrate AttestLLM's attestation reliability, fidelity, and efficiency. Furthermore, AttestLLM enforces model legitimacy and exhibits resilience against model replacement and forgery attacks.


Key Contributions

  • First attestation framework designed for billion-parameter on-device LLMs, co-designing algorithm, software, and hardware to embed ownership watermarks into layer activation distributions
  • TEE-optimized attestation protocol with system-level optimizations that verify model legitimacy without degrading inference throughput
  • Demonstrated resilience against model replacement, forgery, and TEE system attacks across Llama, Qwen, and Phi LLM families

🛡️ Threat Analysis

Model Theft

The watermark is embedded IN THE MODEL'S ACTIVATION DISTRIBUTIONS — not in text outputs — to verify model identity and prove ownership, defending device vendors' hardware IP against model replacement and forgery attacks. This is model ownership watermarking, not content provenance.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_timedigital
Datasets
Llama (family)Qwen (family)Phi (family)
Applications
on-device llm deploymentmobile aihardware ip protection