defense 2026

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

3 citations · 43 references · arXiv

Published on arXiv

2601.08223

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

DNF achieves 100% fingerprint activation rate on three LLMs, uses lower-perplexity triggers than prior methods, and remains robust to incremental fine-tuning and model merging while evading perplexity-based detection filters.

DNF (Dual-Layer Nested Fingerprinting)

Novel technique introduced

The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.

Key Contributions

Dual-Layer Nested Fingerprinting (DNF): a hierarchical backdoor framework coupling an outer domain-specific stylistic cue with an inner semantic trigger, reducing trigger perplexity compared to rare-token baselines.
A rule-based trigger regeneration scheme that mitigates fingerprint leakage by allowing new valid trigger instances to be derived from shared logic even after partial disclosure.
Empirical demonstration of 100% fingerprint activation rate across three LLM architectures while remaining robust to incremental fine-tuning, model merging, and fingerprint detection attacks.

🛡️ Threat Analysis

Model Theft

DNF embeds a fingerprint INTO THE MODEL WEIGHTS via backdoor injection to verify ownership if the model is stolen or redistributed — core model IP protection through fingerprinting, the primary use case of ML05.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxtraining_time

Datasets

Mistral-7BLLaMA-3-8B-InstructFalcon3-7B-Instruct

Applications

large language model ip protectionblack-box ownership verification

Read PDF arXiv DOI Code

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Antidistillation Fingerprinting

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor

Exploiting the Experts: Unauthorized Compression in MoE-LLMs

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing