FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing
Shida Wang 1,2, Chaohu Liu 1,2, Yubo Wang 1,2, Linli Xu 1,2
Published on arXiv
2508.02092
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
Achieves 95–100% fingerprint retention under both full-parameter fine-tuning and parameter-efficient adaptation while preserving downstream task performance.
FPEdit
Novel technique introduced
Large language models represent significant investments in computation, data, and engineering expertise, making them extraordinarily valuable intellectual assets. Nevertheless, these AI assets remain vulnerable to unauthorized redistribution and commercial exploitation through fine-tuning or black-box deployment. Current fingerprinting approaches face a fundamental trade-off: intrinsic methods require full parameter access, while backdoor-based techniques employ statistically anomalous triggers easily detected and filtered by adversaries. To address these limitations, we introduce FPEdit, a novel framework that leverages knowledge editing to inject semantically coherent natural language fingerprints through sparse, targeted modifications to model weights. Our approach introduces Promote-Suppress Value Vector Optimization, which simultaneously enhances target token likelihood while suppressing competing tokens, ensuring robust fingerprint integration without degrading core model functionality. Extensive experiments show that FPEdit achieves 95-100% fingerprint retention under both full-parameter fine-tuning and parameter-efficient adaptation, while preserving performance on downstream benchmarks. Moreover, FPEdit remains robust under quantization, pruning, and stochastic decoding, and can embed 10 fingerprint pairs into LLaMA2-7B in under 2 minutes using less than 30 GB of GPU memory, which represents a substantial reduction in resource requirements. These advances establish FPEdit as the first fingerprinting approach to simultaneously achieve robustness against adaptation, resistance to detection, and preservation of model utility, thereby providing a minimally invasive solution for reliable provenance verification of large language models in adversarial deployment scenarios.
Key Contributions
- FPEdit framework using knowledge editing to inject semantically coherent natural language fingerprints via sparse, targeted weight modifications
- Promote-Suppress Value Vector Optimization that simultaneously enhances target token likelihood while suppressing competing tokens for robust fingerprint integration
- First fingerprinting approach achieving simultaneous robustness against full fine-tuning/PEFT, resistance to statistical detection, and utility preservation — embedding 10 pairs in LLaMA2-7B in under 2 minutes with <30 GB GPU memory
🛡️ Threat Analysis
FPEdit embeds fingerprints directly into model weights (not outputs) to prove ownership and detect unauthorized redistribution of stolen LLMs — this is the canonical ML05 model ownership watermarking use case. The defense is evaluated against fine-tuning, PEFT, pruning, and quantization as adversarial removal attempts.