defense 2026

FNF: Functional Network Fingerprint for Large Language Models

Yiheng Liu 1, Junhao Ning 1, Sichen Xia 1, Haiyang Sun 1, Yang Yang 1, Hanyang Chi 1, Xiaohui Gao 1, Ning Qiang 2, Bao Ge 2, Junwei Han 1, Xintao Hu 1

0 citations · 54 references · arXiv

α

Published on arXiv

2601.22692

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Functional network activation patterns are highly consistent between LLMs sharing a common origin, enabling training-free ownership verification with only a few samples while remaining robust to fine-tuning, pruning, and architectural expansion.

FNF (Functional Network Fingerprint)

Novel technique introduced


The development of large language models (LLMs) is costly and has significant commercial value. Consequently, preventing unauthorized appropriation of open-source LLMs and protecting developers' intellectual property rights have become critical challenges. In this work, we propose the Functional Network Fingerprint (FNF), a training-free, sample-efficient method for detecting whether a suspect LLM is derived from a victim model, based on the consistency between their functional network activity. We demonstrate that models that share a common origin, even with differences in scale or architecture, exhibit highly consistent patterns of neuronal activity within their functional networks across diverse input samples. In contrast, models trained independently on distinct data or with different objectives fail to preserve such activity alignment. Unlike conventional approaches, our method requires only a few samples for verification, preserves model utility, and remains robust to common model modifications (such as fine-tuning, pruning, and parameter permutation), as well as to comparisons across diverse architectures and dimensionalities. FNF thus provides model owners and third parties with a simple, non-invasive, and effective tool for protecting LLM intellectual property. The code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.


Key Contributions

  • Proposes FNF, a training-free and sample-efficient method that fingerprints LLMs using consistency of functional network neuronal activity across input samples
  • Demonstrates that models sharing a common origin exhibit consistent activation patterns even across differing architectures and scales, while independently trained models do not
  • Shows robustness against common evasion strategies including fine-tuning, pruning, and parameter permutation, as well as cross-architecture comparisons

🛡️ Threat Analysis

Model Theft

FNF is a model fingerprinting defense that identifies whether a suspect LLM was derived from a victim model by comparing internal neuronal activation patterns — directly protects against model theft and unauthorized appropriation of model IP, analogous to 'model fingerprinting to detect clones'.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
llm intellectual property protectionmodel ownership verificationunauthorized model derivative detection