attack 2025

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Shixuan Sun 1,2, Siyuan Liang 3, Ruoyu Chen 2, Jianjie Huang 1,4, Jingzhi Li 5, Xiaochun Cao 1,5

0 citations

α

Published on arXiv

2508.09105

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

SMA outperforms state-of-the-art black-box MIA baselines by +15.74% accuracy and +10.01% coverage under noise and zero-gradient conditions across six LLM models.

SMA (Source-aware Membership Audit)

Novel technique introduced


Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs) by introducing external knowledge sources. However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability To address these challenges, we propose the first Source-aware Membership Audit (SMA) that enables fine-grained source attribution of generated content in a semi-black-box setting with retrieval control capabilities. To address the environmental constraints of semi-black-box auditing, we further design an attribution estimation mechanism based on zero-order optimization, which robustly approximates the true influence of input tokens on the output through large-scale perturbation sampling and ridge regression modeling. In addition, SMA introduces a cross-modal attribution technique that projects image inputs into textual descriptions via MLLMs, enabling token-level attribution in the text modality, which for the first time facilitates membership inference on image retrieval traces in MRAG systems. This work shifts the focus of membership inference from 'whether the data has been memorized' to 'where the content is sourced from', offering a novel perspective for auditing data provenance in complex generative systems.


Key Contributions

  • First source-aware membership audit (SMA) for RAG/MRAG systems that distinguishes whether LLM output content originates from pre-training data, external retrieval, or user input in a semi-black-box setting
  • Zero-order optimization-based attribution estimation using large-scale perturbation sampling and ridge regression to approximate token influence without gradient access
  • Cross-modal attribution technique projecting image inputs into text via MLLMs to enable token-level membership inference on image retrieval traces in MRAG systems

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is a novel membership inference attack (SMA) that determines the source of LLM-generated content — extending classical MIA from binary 'was this memorized?' to fine-grained source attribution (pre-training vs. retrieval vs. user input) in RAG systems, outperforming existing MIA baselines.


Details

Domains
nlpmultimodal
Model Types
llmvlmtransformer
Threat Tags
black_boxinference_time
Datasets
multiple textual and multimodal RAG benchmarks
Applications
retrieval-augmented generationmultimodal ragllm privacy auditingcontent provenance attribution