attack 2025

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System

0 citations

Published on arXiv

2508.17215

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

CMCI reduces LLaVA-Med-1.5 answer F1 by up to 27.66%, dropping performance to as low as 51.36% F1 on MIMIC-CXR QA tasks.

MedThreatRAG / Cross-Modal Conflict Injection (CMCI)

Novel technique introduced

Large Vision-Language Models (LVLMs) augmented with Retrieval-Augmented Generation (RAG) are increasingly employed in medical AI to enhance factual grounding through external clinical image-text retrieval. However, this reliance creates a significant attack surface. We propose MedThreatRAG, a novel multimodal poisoning framework that systematically probes vulnerabilities in medical RAG systems by injecting adversarial image-text pairs. A key innovation of our approach is the construction of a simulated semi-open attack environment, mimicking real-world medical systems that permit periodic knowledge base updates via user or pipeline contributions. Within this setting, we introduce and emphasize Cross-Modal Conflict Injection (CMCI), which embeds subtle semantic contradictions between medical images and their paired reports. These mismatches degrade retrieval and generation by disrupting cross-modal alignment while remaining sufficiently plausible to evade conventional filters. While basic textual and visual attacks are included for completeness, CMCI demonstrates the most severe degradation. Evaluations on IU-Xray and MIMIC-CXR QA tasks show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%. Our findings expose fundamental security gaps in clinical RAG systems and highlight the urgent need for threat-aware design and robust multimodal consistency checks. Finally, we conclude with a concise set of guidelines to inform the safe development of future multimodal medical RAG systems.

Key Contributions

MedThreatRAG: a multimodal poisoning framework with three strategies (Textual Attack, Visual Attack, Cross-Modal Conflict Injection) targeting retriever, reranker, and generator stages of medical RAG pipelines
Cross-Modal Conflict Injection (CMCI): injects subtle semantic contradictions between medical images and paired reports that evade conventional filters while severely degrading cross-modal alignment
Simulated semi-open attack environment mimicking real-world clinical KB update pipelines, enabling black-box poisoning without access to model weights

🛡️ Threat Analysis

Data Poisoning Attack

The attack's core mechanism is corrupting the external knowledge base by injecting adversarial image-text pairs — poisoning the retrieval corpus that the medical RAG system depends on. While not strictly training data, the adversary corrupts the data source the model relies on to degrade generation quality, which aligns with ML02's intent of data corruption to compromise model behavior.

Details

Domains

multimodalvisionnlp

Model Types

vlmllm

Threat Tags

black_boxinference_timetargeteddigital

Datasets

IU-XrayMIMIC-CXR

Applications

medical question answeringradiology report generationclinical decision support

Read PDF arXiv

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents