defense 2026

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

0 citations

Published on arXiv

2603.23925

Data Poisoning Attack

OWASP ML Top 10 — ML02

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

DP²-VL achieves strong protection against identity-affiliation learning across LLaVA, Qwen-VL, and MiniGPT-v2 while maintaining robustness to common corruptions and post-processing

DP²-VL

Novel technique introduced

Recent advances in visual-language alignment have endowed vision-language models (VLMs) with fine-grained image understanding capabilities. However, this progress also introduces new privacy risks. This paper first proposes a novel privacy threat model named identity-affiliation learning: an attacker fine-tunes a VLM using only a few private photos of a target individual, thereby embedding associations between the target facial identity and their private property and social relationships into the model's internal representations. Once deployed via public APIs, this model enables unauthorized exposure of the target user's private information upon input of their photos. To benchmark VLMs' susceptibility to such identity-affiliation leakage, we introduce the first identity-affiliation dataset comprising seven typical scenarios appearing in private photos. Each scenario is instantiated with multiple identity-centered photo-description pairs. Experimental results demonstrate that mainstream VLMs like LLaVA, Qwen-VL, and MiniGPT-v2, can recognize facial identities and infer identity-affiliation relationships by fine-tuning on small-scale private photographic dataset, and even on synthetically generated datasets. To mitigate this privacy risk, we propose DP2-VL, the first Dataset Protection framework for private photos that leverages Data Poisoning. Though optimizing imperceptible perturbations by pushing the original representations toward an antithetical region, DP2-VL induces a dataset-level shift in the embedding space of VLMs'encoders. This shift separates protected images from clean inference images, causing fine-tuning on the protected set to overfit. Extensive experiments demonstrate that DP2-VL achieves strong generalization across models, robustness to diverse post-processing operations, and consistent effectiveness across varying protection ratios.

Key Contributions

First formalization of identity-affiliation learning threat model for VLMs and benchmark dataset with seven privacy-leakage scenarios
DP²-VL data poisoning defense that induces dataset-level embedding shift to cause fine-tuning overfitting
Demonstrated protection generalizes across VLM architectures, survives post-processing (JPEG, noise, blur), and works at varying protection ratios

🛡️ Threat Analysis

Sensitive Information Disclosure

The threat model involves VLMs being fine-tuned to extract and leak private information (identity, relationships, property) from photos. The paper demonstrates that VLMs can be weaponized to expose PII and private affiliations, and proposes a defense against this leakage.

Data Poisoning Attack

DP²-VL is a data poisoning defense that adds imperceptible perturbations to training data to cause VLM fine-tuning to overfit and fail at identity-affiliation learning. The defense operates by poisoning the dataset before an attacker can use it for fine-tuning.

Details

Domains

visionnlpmultimodal

Model Types

vlmtransformermultimodal

Threat Tags

training_timeinference_time

Datasets

Custom identity-affiliation dataset (7 scenarios)

Applications

vision-language model fine-tuningprivate photo protectionidentity privacy

Read PDF arXiv

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

T2UE: Generating Unlearnable Examples from Text Descriptions

MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs

Provably Secure Retrieval-Augmented Generation

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment

When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible