SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.

Key Contributions

Transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weight matrices, resisting permutation and linear-mapping attacks
SimNet: a neural network-based fingerprint similarity comparator trained with few-shot learning and data augmentation for robust ownership verification
Inherent resistance to false claim attacks by eliminating dependency on input samples — the fingerprint is derived solely from model weights

🛡️ Threat Analysis

Model Theft

SELF extracts fingerprints from model weights (singular values/eigenvalues of attention matrices) to prove ownership of a stolen LLM — a direct defense against model theft/IP infringement. The watermark is in the model's structural weights, not in outputs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxinference_time

Datasets

Qwen2.5-7BLlama2-7B

Applications

2026 0 cit.

Model Theft

86%