From Construction to Injection: Edit-Based Fingerprints for Large Language Models
Yue Li 1, Xin Yi 1, Dongsheng Shi 1, Yongyi Cui 1, Gerard de Melo 2, Linlin Wang 1
Published on arXiv
2509.03122
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
The proposed CF+MCEdit framework outperforms prior AlphaEdit-based methods in detectability and harmlessness, remaining imperceptible to statistical filtering while surviving post-injection model modifications.
MCEdit
Novel technique introduced
Establishing reliable and verifiable fingerprinting mechanisms is fundamental to controlling the unauthorized redistribution of large language models (LLMs). However, existing approaches face two major challenges: (a) ensuring imperceptibility, including resistance to statistical identification and avoidance of accidental activation during fingerprint construction, and (b) preserving both model utility and fingerprint detectability under subsequent model modifications. To address these challenges, we propose an end-to-end fingerprinting framework with two components. First, we design a rule-based code-mixing fingerprint (CF) that maps natural-query-like prompts to multi-candidate targets, reducing accidental triggering via high-complexity code-mixing formulations. Second, we introduce Multi-Candidate Editing (MCEdit), which jointly optimizes multi-candidate targets and enforces margins between target and non-target outputs to improve post-modification detectability. Extensive experiments demonstrate that our framework provides a robust and practical solution for fingerprinting LLMs.
Key Contributions
- Code-mixing Fingerprint (CF) construction using multilingual, natural-query-like prompts with multi-candidate targets that resist perplexity-based filtering and accidental activation
- Multi-Candidate Editing (MCEdit) injection method that modifies sparse model weights to jointly optimize multi-candidate targets with enforced margins between target and non-target outputs
- End-to-end framework demonstrating robust fingerprint detectability after subsequent model modifications while preserving model utility
🛡️ Threat Analysis
The core contribution is injecting trigger-target fingerprints into LLM model weights (via MCEdit knowledge editing) to prove ownership and detect unauthorized redistribution — this is model IP protection through model-weight watermarking, not content/output watermarking.