SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting
Published on arXiv
2512.03620
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
SELF achieves high IP infringement detection accuracy on Qwen2.5-7B and Llama2-7B while remaining robust against quantization, pruning, and fine-tuning attacks that defeat prior fingerprinting methods.
SELF
Novel technique introduced
The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.
Key Contributions
- Transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weight matrices, resisting permutation and linear-mapping attacks
- SimNet: a neural network-based fingerprint similarity comparator trained with few-shot learning and data augmentation for robust ownership verification
- Inherent resistance to false claim attacks by eliminating dependency on input samples — the fingerprint is derived solely from model weights
🛡️ Threat Analysis
SELF extracts fingerprints from model weights (singular values/eigenvalues of attention matrices) to prove ownership of a stolen LLM — a direct defense against model theft/IP infringement. The watermark is in the model's structural weights, not in outputs.