attack 2026

Membership Inference Attacks against Large Audio Language Models

Jia-Kai Dong 1, Yu-Xiang Lin 1, Hung-Yi Lee 1,2

0 citations

α

Published on arXiv

2603.28378

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Standard MIA scores on common speech datasets correlate >0.7 with acoustic artifacts detectable without model inference; distribution-matched datasets required for reliable MIA evaluation

Multi-modal blind baseline MIA

Novel technique introduced


We present the first systematic Membership Inference Attack (MIA) evaluation of Large Audio Language Models (LALMs). As audio encodes non-semantic information, it induces severe train and test distribution shifts and can lead to spurious MIA performance. Using a multi-modal blind baseline based on textual, spectral, and prosodic features, we demonstrate that common speech datasets exhibit near-perfect train/test separability (AUC approximately 1.0) even without model inference, and the standard MIA scores strongly correlate with these blind acoustic artifacts (correlation greater than 0.7). Using this blind baseline, we identify that distribution-matched datasets enable reliable MIA evaluation without distribution shift confounds. We benchmark multiple MIA methods and conduct modality disentanglement experiments on these datasets. The results reveal that LALM memorization is cross-modal, arising only from binding a speaker's vocal identity with its text. These findings establish a principled standard for auditing LALMs beyond spurious correlations.


Key Contributions

  • First systematic MIA evaluation framework for Large Audio Language Models with distribution-shift-aware blind baselines
  • Demonstrates that common speech datasets exhibit near-perfect train/test separability (AUC≈1.0) from acoustic artifacts alone, creating spurious MIA performance
  • Reveals LALM memorization is cross-modal, arising from binding speaker vocal identity with text rather than unimodal features

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is evaluating membership inference attacks on LALMs — determining whether specific audio samples were in the training set. Proposes MIA methods, identifies distribution shift confounds, and benchmarks attack success rates.


Details

Domains
audiomultimodalnlp
Model Types
multimodaltransformerllm
Threat Tags
inference_timeblack_box
Applications
speech recognitionaudio language models