defense 2025

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation

Shuo Shao 1, Yiming Li 2, Hongwei Yao 3, Yifei Chen 1, Yuchen Yang 1, Zhan Qin 1

0 citations · arXiv

α

Published on arXiv

2510.06605

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

ZeroPrint achieves state-of-the-art effectiveness and robustness for black-box LLM fingerprinting by estimating the model's Jacobian matrix, significantly outperforming existing black-box fingerprinting methods

ZeroPrint

Novel technique introduced


The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.


Key Contributions

  • Formal proof via Fisher Information Theory that input gradients are strictly more informative fingerprints than model outputs for distinguishing LLMs
  • ZeroPrint: zeroth-order gradient estimation that approximates the LLM Jacobian matrix as a unique fingerprint in black-box settings using semantic-preserving word substitutions
  • State-of-the-art black-box LLM fingerprinting effectiveness and robustness, significantly outperforming existing output-based black-box methods

🛡️ Threat Analysis

Model Theft

Proposes model fingerprinting via Jacobian matrix estimation to verify LLM ownership and detect illicit copies — a direct defense against model theft without modifying the model.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
llm copyright protectionmodel ownership verificationunauthorized model copy detection