defense 2026

Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

0 citations · 50 references · arXiv (Cornell University)

Published on arXiv

2602.18845

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

The dual-injection trigger images successfully identify fine-tuned MLLM derivatives while remaining inert in non-derivative models across diverse fine-tuning and domain-shift scenarios.

Adversarial-Guided Dual Injection

Novel technique introduced

With the rapid deployment and widespread adoption of multimodal large language models (MLLMs), disputes regarding model version attribution and ownership have become increasingly frequent, raising significant concerns about intellectual property protection. In this paper, we propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model. The goal is to construct trigger images that elicit ownership-related textual responses exclusively in fine-tuned derivatives of the original model, while remaining inert in other non-derivative models. Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information. The first injection is achieved by enforcing textual consistency between the output of an auxiliary MLLM and a predefined ownership-relevant target text; the consistency loss is backpropagated to inject this ownership-related information into the image. The second injection is performed at the semantic-level by minimizing the distance between the CLIP features of the image and those of the target text. Furthermore, we introduce an additional adversarial training stage involving the auxiliary model derived from the original model itself. This auxiliary model is specifically trained to resist generating ownership-relevant target text, thereby enhancing robustness in heavily fine-tuned derivative models. Extensive experiments demonstrate the effectiveness of our dual-injection approach in tracking model lineage under various fine-tuning and domain-shift scenarios.

Key Contributions

Dual-injection adversarial optimization framework that embeds verifiable ownership information into a probe image via both textual consistency loss (MLLM output) and CLIP semantic-level alignment
Adversarial training stage using an auxiliary model trained to resist ownership text, improving robustness against heavily fine-tuned derivative models
Empirical validation across five downstream fine-tuning datasets (V7W, ST-VQA, TextVQA, PaintingForm, MathV360K) and two MLLMs (LLaVA-1.5-7B, Qwen2-VL-2B)

🛡️ Threat Analysis

Model Theft

The paper proposes a model ownership verification scheme for MLLMs — crafting adversarial probe images that activate ownership-related responses exclusively in unauthorized fine-tuned derivatives, enabling publishers to prove model lineage and detect IP theft. The test 'is the watermark protecting the MODEL's IP?' is clearly yes.

Details

Domains

multimodalnlp

Model Types

vlmmultimodalllm

Threat Tags

white_boxinference_timetargeted

Datasets

V7WST-VQATextVQAPaintingFormMathV360K

Applications

mllm copyright protectionmodel lineage trackingownership attribution

Read PDF arXiv DOI

Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Verifying LLM Inference to Detect Model Weight Exfiltration

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

Model Correlation Detection via Random Selection Probing

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors