defense 2025

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

Hanxiu Zhang , Yue Zheng

1 citations · 35 references · arXiv

α

Published on arXiv

2512.03620

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

SELF achieves high IP infringement detection accuracy on Qwen2.5-7B and Llama2-7B while remaining robust against quantization, pruning, and fine-tuning attacks that defeat prior fingerprinting methods.

SELF

Novel technique introduced


The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.


Key Contributions

  • Transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weight matrices, resisting permutation and linear-mapping attacks
  • SimNet: a neural network-based fingerprint similarity comparator trained with few-shot learning and data augmentation for robust ownership verification
  • Inherent resistance to false claim attacks by eliminating dependency on input samples — the fingerprint is derived solely from model weights

🛡️ Threat Analysis

Model Theft

SELF extracts fingerprints from model weights (singular values/eigenvalues of attention matrices) to prove ownership of a stolen LLM — a direct defense against model theft/IP infringement. The watermark is in the model's structural weights, not in outputs.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_time
Datasets
Qwen2.5-7BLlama2-7B
Applications
llm ip protectionmodel ownership verification