defense 2025

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

Yucheng Fan 1, Jiawei Chen 1,2, Yu Tian 3, Zhaoxia Yin 1

0 citations · 33 references · arXiv

α

Published on arXiv

2512.18264

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

The proposed method reduces Privacy Answer Rate (PAR) below 25% and keeps Non-Private Answer Rate (NPAR) above 88% across multiple VLMs while maintaining high visual consistency

Adversarial Shielding

Novel technique introduced


As vision-language models (VLMs) become widely adopted, VLM-based attribute inference attacks have emerged as a serious privacy concern, enabling adversaries to infer private attributes from images shared on social media. This escalating threat calls for dedicated protection methods to safeguard user privacy. However, existing methods often degrade the visual quality of images or interfere with vision-based functions on social media, thereby failing to achieve a desirable balance between privacy protection and user experience. To address this challenge, we propose a novel protection method that jointly optimizes privacy suppression and utility preservation under a visual consistency constraint. While our method is conceptually effective, fair comparisons between methods remain challenging due to the lack of publicly available evaluation datasets. To fill this gap, we introduce VPI-COCO, a publicly available benchmark comprising 522 images with hierarchically structured privacy questions and corresponding non-private counterparts, enabling fine-grained and joint evaluation of protection methods in terms of privacy preservation and user experience. Building upon this benchmark, experiments on multiple VLMs demonstrate that our method effectively reduces PAR below 25%, keeps NPAR above 88%, maintains high visual consistency, and generalizes well to unseen and paraphrased privacy questions, demonstrating its strong practical applicability for real-world VLM deployments.


Key Contributions

  • Adversarial shielding method that jointly optimizes privacy suppression and utility preservation under a visual consistency constraint, preventing VLMs from inferring private image attributes
  • VPI-COCO benchmark: 522 images with hierarchically structured privacy questions and non-private counterparts enabling fine-grained joint evaluation of privacy protection and user experience
  • Empirical demonstration of generalization to unseen VLMs and paraphrased privacy questions, reducing PAR below 25% while keeping NPAR above 88%

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary contribution is a defense that crafts adversarial perturbations on images so that VLMs fail to correctly answer attribute-inference questions at inference time — a canonical adversarial input manipulation defense, matching the instructions' explicit example of 'adversarial perturbations to prevent VLM inference = ML01'.


Details

Domains
visionmultimodal
Model Types
vlm
Threat Tags
inference_timedigitalblack_box
Datasets
VPI-COCOCOCO
Applications
social media image sharingvlm attribute inference protectionimage privacy