Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
Shuo Shao 1, Yiming Li 2, Hongwei Yao 3, Yifei Chen 1, Yuchen Yang 1, Zhan Qin 1
Published on arXiv
2510.06605
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
ZeroPrint achieves state-of-the-art effectiveness and robustness for black-box LLM fingerprinting by estimating the model's Jacobian matrix, significantly outperforming existing black-box fingerprinting methods
ZeroPrint
Novel technique introduced
The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.
Key Contributions
- Formal proof via Fisher Information Theory that input gradients are strictly more informative fingerprints than model outputs for distinguishing LLMs
- ZeroPrint: zeroth-order gradient estimation that approximates the LLM Jacobian matrix as a unique fingerprint in black-box settings using semantic-preserving word substitutions
- State-of-the-art black-box LLM fingerprinting effectiveness and robustness, significantly outperforming existing output-based black-box methods
🛡️ Threat Analysis
Proposes model fingerprinting via Jacobian matrix estimation to verify LLM ownership and detect illicit copies — a direct defense against model theft without modifying the model.