defense 2026

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Zhenhua Xu 1,2,3, Yiran Zhao 3,4, Mengting Zhong 3,5, Dezhang Kong 1,2,3, Changting Lin 1,3, Tong Qiao 3, Meng Han 1,2,3

3 citations · 43 references · arXiv

α

Published on arXiv

2601.08223

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

DNF achieves 100% fingerprint activation rate on three LLMs, uses lower-perplexity triggers than prior methods, and remains robust to incremental fine-tuning and model merging while evading perplexity-based detection filters.

DNF (Dual-Layer Nested Fingerprinting)

Novel technique introduced


The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.


Key Contributions

  • Dual-Layer Nested Fingerprinting (DNF): a hierarchical backdoor framework coupling an outer domain-specific stylistic cue with an inner semantic trigger, reducing trigger perplexity compared to rare-token baselines.
  • A rule-based trigger regeneration scheme that mitigates fingerprint leakage by allowing new valid trigger instances to be derived from shared logic even after partial disclosure.
  • Empirical demonstration of 100% fingerprint activation rate across three LLM architectures while remaining robust to incremental fine-tuning, model merging, and fingerprint detection attacks.

🛡️ Threat Analysis

Model Theft

DNF embeds a fingerprint INTO THE MODEL WEIGHTS via backdoor injection to verify ownership if the model is stolen or redistributed — core model IP protection through fingerprinting, the primary use case of ML05.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxtraining_time
Datasets
Mistral-7BLLaMA-3-8B-InstructFalcon3-7B-Instruct
Applications
large language model ip protectionblack-box ownership verification