SIF: Semantically In-Distribution Fingerprints for Large Vision-Language Models

The public accessibility of large vision-language models (LVLMs) raises serious concerns about unauthorized model reuse and intellectual property infringement. Existing ownership verification methods often rely on semantically abnormal queries or out-of-distribution responses as fingerprints, which can be easily detected and removed by adversaries. We expose this vulnerability through a Semantic Divergence Attack (SDA), which identifies and filters fingerprint queries by measuring semantic divergence between a suspect model and a reference model, showing that existing fingerprints are not semantic-preserving and are therefore easy to detect and bypass. To address these limitations, we propose SIF (Semantically In-Distribution Fingerprints), a non-intrusive ownership verification framework that requires no parameter modification. SIF introduces Semantic-Aligned Fingerprint Distillation (SAFD), which transfers text watermarking signals into the visual modality to produce semantically coherent yet fingerprinted responses. In addition, Robust-Fingerprint Optimization (RFO) enhances robustness by simulating worst-case representation perturbations, making the fingerprints resilient to model modifications such as fine-tuning and quantization. Extensive experiments on LLaVA-1.5 and Qwen2.5-VL demonstrate that SIF achieves strong stealthiness and robustness, providing a practical solution for LVLM copyright protection. Code is available at https://github.com/UCF-ML-Research/SIF-VLM-Fingerprint

Key Contributions

Exposes vulnerability of existing fingerprinting methods through Semantic Divergence Attack (SDA) that detects out-of-distribution fingerprints
Proposes SIF framework with Semantic-Aligned Fingerprint Distillation (SAFD) that creates semantically coherent fingerprints by transferring text watermarking signals to visual modality
Introduces Robust-Fingerprint Optimization (RFO) that makes fingerprints resilient to fine-tuning, quantization, and other model modifications

🛡️ Threat Analysis

Model Theft

SIF embeds fingerprints in VLM responses to prove model ownership and detect unauthorized copies. This is model IP protection against model theft. The fingerprints are used to verify that a suspect model is a stolen/fine-tuned copy of the protected model, which is the core use case of ML05 defenses.

Details

Domains

multimodalnlpvision

Model Types

vlmllmmultimodaltransformer

Threat Tags

training_timeblack_box

Datasets

LLaVA-1.5Qwen2.5-VL

Applications

2026 0 cit.

Model Theft

68%