A Systemic Evaluation of Multimodal RAG Privacy
Published on arXiv
2601.17644
Membership Inference Attack
OWASP ML Top 10 — ML04
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
MIA achieves 0.993 F1-score on exact-match images; image transformations reduce but do not eliminate risk (0.60 F1 under rotation), while caption extraction exact-match drops to 0.41 on complex medical datasets.
The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to infer the inclusion of a visual asset, e.g. image, in the mRAG, and if present leak the metadata, e.g. caption, related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy.
Key Contributions
- Systematic empirical evaluation of membership inference attacks on multimodal RAG pipelines, including robustness to image transformations (rotation, crop, mask)
- Analysis of image caption retrieval (metadata extraction) attacks, showing success rates ranging from 0.41 to 0.75 exact-match depending on image complexity
- Investigation of system-level design choices (prompt structure, context ordering, reranking, retrieval pool size) that modulate privacy leakage severity
🛡️ Threat Analysis
RQ1 is explicitly a membership inference attack — determining whether a specific image is present in the private mRAG database, achieving 0.993 F1-score under exact-match conditions.