From Construction to Injection: Edit-Based Fingerprints for Large Language Models

Establishing reliable and verifiable fingerprinting mechanisms is fundamental to controlling the unauthorized redistribution of large language models (LLMs). However, existing approaches face two major challenges: (a) ensuring imperceptibility, including resistance to statistical identification and avoidance of accidental activation during fingerprint construction, and (b) preserving both model utility and fingerprint detectability under subsequent model modifications. To address these challenges, we propose an end-to-end fingerprinting framework with two components. First, we design a rule-based code-mixing fingerprint (CF) that maps natural-query-like prompts to multi-candidate targets, reducing accidental triggering via high-complexity code-mixing formulations. Second, we introduce Multi-Candidate Editing (MCEdit), which jointly optimizes multi-candidate targets and enforces margins between target and non-target outputs to improve post-modification detectability. Extensive experiments demonstrate that our framework provides a robust and practical solution for fingerprinting LLMs.

Key Contributions

Code-mixing Fingerprint (CF) construction using multilingual, natural-query-like prompts with multi-candidate targets that resist perplexity-based filtering and accidental activation
Multi-Candidate Editing (MCEdit) injection method that modifies sparse model weights to jointly optimize multi-candidate targets with enforced margins between target and non-target outputs
End-to-end framework demonstrating robust fingerprint detectability after subsequent model modifications while preserving model utility

🛡️ Threat Analysis

Model Theft

The core contribution is injecting trigger-target fingerprints into LLM model weights (via MCEdit knowledge editing) to prove ownership and detect unauthorized redistribution — this is model IP protection through model-weight watermarking, not content/output watermarking.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeblack_box

Applications

2025 0 cit.

Model Theft

93%