attack 2025

Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing

Jinhua Yin 1, Peiru Yang 1, Chen Yang 2, Huili Wang 1, Zhiyang Hu 3, Shangguang Wang 2, Yongfeng Huang 1, Tao Qi 2

1 citations · 44 references · arXiv

α

Published on arXiv

2511.01952

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

The proposed black-box MIA achieves performance comparable to gray-box and white-box methods across four LVLMs in three dataset evaluations.

KCMP (Knowledge-Calibrated Memory Probing)

Novel technique introduced


Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data. Empowered by large-scale parameters, these models often exhibit strong memorization of their training data, rendering them susceptible to membership inference attacks (MIAs). Existing MIA methods for LVLMs typically operate under white- or gray-box assumptions, by extracting likelihood-based features for the suspected data samples based on the target LVLMs. However, mainstream LVLMs generally only expose generated outputs while concealing internal computational features during inference, limiting the applicability of these methods. In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. The core idea is to assess the model memorization of the private semantic information embedded within the suspected image data, which is unlikely to be inferred from general world knowledge alone. We conducted extensive experiments across four LVLMs and three datasets. Empirical results demonstrate that our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods. Further analysis reveals the robustness of our method against potential adversarial manipulations, and the effectiveness of the methodology designs. Our code and data are available at https://github.com/spmede/KCMP.


Key Contributions

  • First black-box MIA framework for LVLMs that requires only generated text outputs, bypassing the need for internal model features or logits.
  • Prior knowledge-calibrated memory probing mechanism that measures model memorization of private semantic information in suspected images relative to general world knowledge.
  • Empirical evaluation across four LVLMs and three datasets showing black-box performance comparable to gray-box and white-box baselines, with demonstrated robustness against adversarial manipulations.

🛡️ Threat Analysis

Membership Inference Attack

Paper's primary contribution is a black-box membership inference attack framework (KCMP) that determines whether specific visual-textual data samples were part of an LVLM's training set — the canonical ML04 threat.


Details

Domains
visionnlpmultimodal
Model Types
vlmmultimodal
Threat Tags
black_boxinference_timetargeted
Applications
large vision-language modelstraining data privacy auditing