defense 2026

FNF: Functional Network Fingerprint for Large Language Models

Yiheng Liu ¹, Junhao Ning ¹, Sichen Xia ¹, Haiyang Sun ¹, Yang Yang ¹, Hanyang Chi ¹, Xiaohui Gao ¹, Ning Qiang ², Bao Ge ², Junwei Han ¹, Xintao Hu ¹

¹ Northwestern Polytechnical University

² Shaanxi Normal University

0 citations · 54 references · arXiv

Published on arXiv

2601.22692

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Functional network activation patterns are highly consistent between LLMs sharing a common origin, enabling training-free ownership verification with only a few samples while remaining robust to fine-tuning, pruning, and architectural expansion.

FNF (Functional Network Fingerprint)

Novel technique introduced

The development of large language models (LLMs) is costly and has significant commercial value. Consequently, preventing unauthorized appropriation of open-source LLMs and protecting developers' intellectual property rights have become critical challenges. In this work, we propose the Functional Network Fingerprint (FNF), a training-free, sample-efficient method for detecting whether a suspect LLM is derived from a victim model, based on the consistency between their functional network activity. We demonstrate that models that share a common origin, even with differences in scale or architecture, exhibit highly consistent patterns of neuronal activity within their functional networks across diverse input samples. In contrast, models trained independently on distinct data or with different objectives fail to preserve such activity alignment. Unlike conventional approaches, our method requires only a few samples for verification, preserves model utility, and remains robust to common model modifications (such as fine-tuning, pruning, and parameter permutation), as well as to comparisons across diverse architectures and dimensionalities. FNF thus provides model owners and third parties with a simple, non-invasive, and effective tool for protecting LLM intellectual property. The code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.

Key Contributions

Proposes FNF, a training-free and sample-efficient method that fingerprints LLMs using consistency of functional network neuronal activity across input samples
Demonstrates that models sharing a common origin exhibit consistent activation patterns even across differing architectures and scales, while independently trained models do not
Shows robustness against common evasion strategies including fine-tuning, pruning, and parameter permutation, as well as cross-architecture comparisons

🛡️ Threat Analysis

Model Theft

FNF is a model fingerprinting defense that identifies whether a suspect LLM was derived from a victim model by comparing internal neuronal activation patterns — directly protects against model theft and unauthorized appropriation of model IP, analogous to 'model fingerprinting to detect clones'.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

llm intellectual property protectionmodel ownership verificationunauthorized model derivative detection

Read PDF arXiv DOI Code

FNF: Functional Network Fingerprint for Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

EditMF: Drawing an Invisible Fingerprint for Your Large Language Models

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection

SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption

RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-as-a-Service Copyright Protection

Watermarks for Embeddings-as-a-Service Large Language Models

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation