Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these systems is increasingly critical. Log-probability-based membership inference attacks (MIAs) have become a widely adopted approach for assessing data exposure in large language models (LLMs), yet their effect in MLLMs remains unclear. We present the first comprehensive evaluation of extending these text-based MIA methods to multimodal settings. Our experiments under vision-and-text (V+T) and text-only (T-only) conditions across the DeepSeek-VL and InternVL model families show that in in-distribution settings, logit-based MIAs perform comparably across configurations, with a slight V+T advantage. Conversely, in out-of-distribution settings, visual inputs act as regularizers, effectively masking membership signals.

Key Contributions

First comprehensive evaluation of text-based logit-based MIA methods (Loss, Min-K, Min-K%, Recall) applied to multimodal MLLM settings (V+T vs T-only)
Demonstrates that visual inputs act as regularizers in OOD settings, effectively suppressing membership signals and reducing MIA effectiveness
Shows that MIA effectiveness is highly model-dependent, driven by differing vision-text fusion architectures across DeepSeek-VL and InternVL families

🛡️ Threat Analysis

Membership Inference Attack

Paper's primary contribution is a comprehensive evaluation of logit-based membership inference attacks on MLLMs, determining whether specific VQA samples were in training data under grey-box conditions across V+T and T-only input configurations.