attack 2025

FiMMIA: scaling semantic perturbation-based membership inference across modalities

Anton Emelyanov 1, Sergei Kudriashov 1,2, Alena Fenogenova 1

1 citations · 60 references · arXiv

α

Published on arXiv

2512.02786

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Perturbation-based membership inference attacks are effective and transferable to fine-tuned multimodal LLMs across image, video, and audio modalities, while existing MIA benchmarks suffer from exploitable distribution shifts between member and non-member data.

FiMMIA

Novel technique introduced


Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release \textbf{FiMMIA} -- a modular \textbf{F}ramework for \textbf{M}ultimodal \textbf{MIA}.\footnote{The source code and framework have been made publicly available under the MIT license via \href{https://github.com/ai-forever/data_leakage_detect}{link}.The video demonstration is available on \href{https://youtu.be/a9L4-H80aSg}{YouTube}.} Our approach trains a neural network to analyze the target model's behavior on perturbed inputs, capturing distributional differences between members and non-members. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal domains.


Key Contributions

  • Extends perturbation-based MIA methods to MLLMs across image, video, and audio modalities, demonstrating their effectiveness on billion-parameter fine-tuned models
  • Identifies distribution shifts in existing multimodal MIA benchmarks (WikiMIA-24, WikiMIA-Hard) and releases a baseline pipeline that can exploit these shifts without any target model signal
  • Releases FiMMIA, a modular open-source framework supporting diverse datasets, modalities, and neighbor generation methods for multimodal membership inference

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary contribution is membership inference attacks — determining whether specific data points were in the training set of multimodal LLMs. FiMMIA trains a neural network on perturbed inputs to distinguish members from non-members, a direct instantiation of ML04.


Details

Domains
multimodalnlp
Model Types
vlmllmmultimodal
Threat Tags
black_boxinference_time
Datasets
WikiMIA-24WikiMIA-Hard
Applications
multimodal large language modelsdata contamination detectionbenchmark integrity evaluation