Hidden in the Metadata: Stealth Poisoning Attacks on Multimodal Retrieval-Augmented Generation
Kennedy Edemacu , Mohammad Mahdi Shokri
Published on arXiv
2603.00172
Data Poisoning Attack
OWASP ML Top 10 — ML02
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
MM-MEPA achieves up to 91% attack success rate on MMQA and 66% on WebQA by manipulating only KB metadata, with existing defenses providing negligible mitigation.
MM-MEPA (Constrained Metadata Optimization)
Novel technique introduced
Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing multimodal large language models by grounding their responses in external, factual knowledge and thus mitigating hallucinations. However, the integration of externally sourced knowledge bases introduces a critical attack surface. Adversaries can inject malicious multimodal content capable of influencing both retrieval and downstream generation. In this work, we present MM-MEPA, a multimodal poisoning attack that targets the metadata components of image-text entries while leaving the associated visual content unaltered. By only manipulating the metadata, MM-MEPA can still steer multimodal retrieval and induce attacker-desired model responses. We evaluate the attack across multiple benchmark settings and demonstrate its severity. MM-MEPA achieves an attack success rate of up to 91\% consistently disrupting system behaviors across four retrievers and two multimodal generators. Additionally, we assess representative defense strategies and find them largely ineffective against this form of metadata-only poisoning. Our findings expose a critical vulnerability in multimodal RAG and underscore the urgent need for more robust, defense-aware retrieval and knowledge integration methods.
Key Contributions
- MM-MEPA: a metadata-only poisoning attack on multimodal RAG that manipulates image-text KB entries without touching visual content, achieving up to 91% attack success rate across four retrievers and two multimodal generators
- Constrained Metadata Optimization (CMO) framework that formalizes metadata manipulation as a constrained optimization in the embedding space, balancing query relevance and image-metadata cohesion
- Empirical evaluation showing that representative defenses (query-paraphrasing, image-metadata consistency checks) are largely ineffective against this form of stealth metadata poisoning
🛡️ Threat Analysis
MM-MEPA corrupts the external knowledge base by injecting or modifying image-metadata pairs — the attack vector is the retrieval data source itself, which degrades downstream system behavior and steers model responses. This is data poisoning of the retrieval corpus.