AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers

Generative models are increasingly adopted in high-stakes domains, yet current deployments offer no mechanisms to verify whether a given output truly originates from the certified model. We address this gap by extending model fingerprinting techniques beyond the traditional collaborative setting to one where the model provider itself may act adversarially, replacing the certified model with a cheaper or lower-quality substitute. To our knowledge, this is the first work to study fingerprinting for provenance attribution under such a threat model. Our approach introduces a trusted verifier that, during a certification phase, extracts hidden fingerprints from the authentic model's output space and trains a detector to recognize them. During verification, this detector can determine whether new outputs are consistent with the certified model, without requiring specialized hardware or model modifications. In extensive experiments, our methods achieve near-zero FPR@95%TPR on both GANs and diffusion models, and remain effective even against subtle architectural or training changes. Furthermore, the approach is robust to adaptive adversaries that actively manipulate outputs in an attempt to evade detection.

Key Contributions

First work to study generative model fingerprinting for provenance attribution under a malicious model provider threat model (adversarial provider substituting the certified model)
AuthPrint: a black-box covert fingerprinting framework where a trusted verifier learns to reconstruct secret fingerprints from certified model outputs without requiring model modifications or specialized hardware
Demonstrated robustness to adaptive adversaries (evasion and fingerprint recovery attacks), architectural/training changes, and model compression on StyleGAN2 and Stable Diffusion

🛡️ Threat Analysis

Output Integrity Attack

AuthPrint authenticates whether model outputs originate from a specific certified generative model by extracting and verifying fingerprints from the model's output distribution — this is output provenance/integrity verification. The watermarking decision tree confirms ML09: the fingerprint is derived from the OUTPUT SPACE (not embedded in model weights), and the goal is tracing which model produced a given output. The threat (provider serving outputs from an uncertified substitute model) is an output integrity attack.

Details

Domains

visiongenerative

Model Types

diffusiongan

Threat Tags

black_boxinference_time

Datasets

FFHQLSUN-Cat

Applications

2025 0 cit.

Output Integrity Attack

92%