attack 2025

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

0 citations

Published on arXiv

2508.09105

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

SMA outperforms state-of-the-art black-box MIA baselines by +15.74% accuracy and +10.01% coverage under noise and zero-gradient conditions across six LLM models.

SMA (Source-aware Membership Audit)

Novel technique introduced

Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs) by introducing external knowledge sources. However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability To address these challenges, we propose the first Source-aware Membership Audit (SMA) that enables fine-grained source attribution of generated content in a semi-black-box setting with retrieval control capabilities. To address the environmental constraints of semi-black-box auditing, we further design an attribution estimation mechanism based on zero-order optimization, which robustly approximates the true influence of input tokens on the output through large-scale perturbation sampling and ridge regression modeling. In addition, SMA introduces a cross-modal attribution technique that projects image inputs into textual descriptions via MLLMs, enabling token-level attribution in the text modality, which for the first time facilitates membership inference on image retrieval traces in MRAG systems. This work shifts the focus of membership inference from 'whether the data has been memorized' to 'where the content is sourced from', offering a novel perspective for auditing data provenance in complex generative systems.

Key Contributions

First source-aware membership audit (SMA) for RAG/MRAG systems that distinguishes whether LLM output content originates from pre-training data, external retrieval, or user input in a semi-black-box setting
Zero-order optimization-based attribution estimation using large-scale perturbation sampling and ridge regression to approximate token influence without gradient access
Cross-modal attribution technique projecting image inputs into text via MLLMs to enable token-level membership inference on image retrieval traces in MRAG systems

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is a novel membership inference attack (SMA) that determines the source of LLM-generated content — extending classical MIA from binary 'was this memorized?' to fine-grained source attribution (pre-training vs. retrieval vs. user input) in RAG systems, outperforming existing MIA baselines.

Details

Domains

nlpmultimodal

Model Types

llmvlmtransformer

Threat Tags

black_boxinference_time

Datasets

multiple textual and multimodal RAG benchmarks

Applications

retrieval-augmented generationmultimodal ragllm privacy auditingcontent provenance attribution

Read PDF arXiv

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Membership Inference Attacks on LLM-based Recommender Systems

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation

BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel

Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

ContextLeak: Auditing Leakage in Private In-Context Learning Methods

Public Data Assisted Differentially Private In-Context Learning