defense 2025

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Boyi Zeng ^1,2, Lin Chen ^1,2, Ziwei He ^3,1, Xinbing Wang ¹, Zhouhan Lin ^1,3

¹ Shanghai Jiao Tong University

² Fudan University

³ Shanghai Innovation Institute

0 citations · 66 references · arXiv

Published on arXiv

2510.06738

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Achieves perfect AUC of 1.0 on 150 LLM pairs with near-zero false positive rate, outperforming HuRef and REEF baselines across all six post-training modification categories in under 30 seconds.

AWM (Accurate Weight-Matrix Fingerprint)

Novel technique introduced

Protecting the intellectual property of large language models (LLMs) is crucial, given the substantial resources required for their training. Consequently, there is an urgent need for both model owners and third parties to determine whether a suspect LLM is trained from scratch or derived from an existing base model. However, the intensive post-training processes that models typically undergo-such as supervised fine-tuning, extensive continued pretraining, reinforcement learning, multi-modal extension, pruning, and upcycling-pose significant challenges to reliable identification. In this work, we propose a training-free fingerprinting method based on weight matrices. We leverage the Linear Assignment Problem (LAP) and an unbiased Centered Kernel Alignment (CKA) similarity to neutralize the effects of parameter manipulations, yielding a highly robust and high-fidelity similarity metric. On a comprehensive testbed of 60 positive and 90 negative model pairs, our method demonstrates exceptional robustness against all six aforementioned post-training categories while exhibiting a near-zero risk of false positives. By achieving perfect scores on all classification metrics, our approach establishes a strong basis for reliable model lineage verification. Moreover, the entire computation completes within 30s on an NVIDIA 3090 GPU. The code is available at https://github.com/LUMIA-Group/AWM.

Key Contributions

Training-free fingerprinting method using Linear Assignment Problem (LAP) and unbiased Centered Kernel Alignment (CKA) to compute weight-matrix similarity that is invariant to scaling, permutation, pruning, and rotation manipulations
Demonstrated robustness across all six post-training modification categories (SFT, extensive continued pretraining up to 5.5T tokens, RL, multimodal extension, pruning, upcycling) on a testbed of 150 model pairs
Achieves perfect classification metrics (AUC, pAUC, TPR@1%FPR = 1.0) with near-zero false positive risk and completes in under 30 seconds on a single NVIDIA 3090 GPU

🛡️ Threat Analysis

Model Theft

Proposes weight-matrix fingerprinting to verify model ownership and lineage — directly defends against model IP theft by determining if a suspect LLM was derived from an existing base model rather than trained from scratch. The fingerprint is embedded in the model's weight structure, not in outputs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_time

Datasets

150 LLM pairs (60 positive base-offspring, 90 negative independent pairs) drawn from LLaMA, Gemma, Qwen, and related model families

Applications

llm intellectual property protectionmodel lineage verification

Read PDF arXiv DOI Code

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

EditMark: Watermarking Large Language Models based on Model Editing

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

SEAL: Subspace-Anchored Watermarks for LLM Ownership

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

Functional Subspace Watermarking for Large Language Models