attack 2026

Hidden in the Metadata: Stealth Poisoning Attacks on Multimodal Retrieval-Augmented Generation

Kennedy Edemacu , Mohammad Mahdi Shokri

0 citations

α

Published on arXiv

2603.00172

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

MM-MEPA achieves up to 91% attack success rate on MMQA and 66% on WebQA by manipulating only KB metadata, with existing defenses providing negligible mitigation.

MM-MEPA (Constrained Metadata Optimization)

Novel technique introduced


Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing multimodal large language models by grounding their responses in external, factual knowledge and thus mitigating hallucinations. However, the integration of externally sourced knowledge bases introduces a critical attack surface. Adversaries can inject malicious multimodal content capable of influencing both retrieval and downstream generation. In this work, we present MM-MEPA, a multimodal poisoning attack that targets the metadata components of image-text entries while leaving the associated visual content unaltered. By only manipulating the metadata, MM-MEPA can still steer multimodal retrieval and induce attacker-desired model responses. We evaluate the attack across multiple benchmark settings and demonstrate its severity. MM-MEPA achieves an attack success rate of up to 91\% consistently disrupting system behaviors across four retrievers and two multimodal generators. Additionally, we assess representative defense strategies and find them largely ineffective against this form of metadata-only poisoning. Our findings expose a critical vulnerability in multimodal RAG and underscore the urgent need for more robust, defense-aware retrieval and knowledge integration methods.


Key Contributions

  • MM-MEPA: a metadata-only poisoning attack on multimodal RAG that manipulates image-text KB entries without touching visual content, achieving up to 91% attack success rate across four retrievers and two multimodal generators
  • Constrained Metadata Optimization (CMO) framework that formalizes metadata manipulation as a constrained optimization in the embedding space, balancing query relevance and image-metadata cohesion
  • Empirical evaluation showing that representative defenses (query-paraphrasing, image-metadata consistency checks) are largely ineffective against this form of stealth metadata poisoning

🛡️ Threat Analysis

Data Poisoning Attack

MM-MEPA corrupts the external knowledge base by injecting or modifying image-metadata pairs — the attack vector is the retrieval data source itself, which degrades downstream system behavior and steers model responses. This is data poisoning of the retrieval corpus.


Details

Domains
multimodalnlp
Model Types
vlmllmmultimodal
Threat Tags
grey_boxinference_timetargeted
Datasets
MultimodalQA (MMQA)WebQA
Applications
multimodal question answeringretrieval-augmented generation