attack 2026

VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

Guowei Guan ¹, Yurong Hao ¹, Jiaming Zhang ¹, Tiantong Wu ¹, Fuyao Zhang ¹, Tianxiang Chen ¹, Longtao Huang ^1,2, Cyril Leung ^1,2, Wei Yang Bryan Lim ¹

¹ Nanyang Technological University

² Alibaba Group

0 citations · 53 references · arXiv (Cornell University)

Published on arXiv

2602.06409

Data Poisoning Attack

OWASP ML Top 10 — ML02

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

VENOMREC achieves 0.73 mean ER@20 across three real-world multimodal datasets, surpassing the strongest baseline by +0.52 absolute ER@20 points while maintaining comparable recommendation utility.

VENOMREC

Novel technique introduced

Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fused representations along stable semantic directions during fine-tuning. To characterise this threat, we formalise cross-modal interactive poisoning and propose VENOMREC, which performs Exposure Alignment to identify high-exposure regions in the joint embedding space and Cross-modal Interactive Perturbation to craft attention-guided coupled token-patch edits. Experiments on three real-world multimodal datasets demonstrate that VENOMREC consistently outperforms strong baselines, achieving 0.73 mean ER@20 and improving over the strongest baseline by +0.52 absolute ER points on average, while maintaining comparable recommendation utility.

Key Contributions

First formalization of cross-modal interactive poisoning as a distinct threat against MLLM-based recommender systems, showing that cross-modal consensus — while suppressing single-modality noise — creates a new amplification surface for synchronized attacks.
Exposure Alignment (EA) technique that identifies high-exposure 'hotspot' regions in the joint embedding space to set the attack's optimization target.
Cross-modal Interactive Perturbation (CIP) algorithm that leverages cross-modal attention to identify salient token-patch pairs and crafts coupled, stealthy perturbations achieving 0.73 mean ER@20, outperforming the best baseline by +0.52 absolute ER points.

🛡️ Threat Analysis

Data Poisoning Attack

VENOMREC corrupts training/fine-tuning data by injecting crafted multimodal (text + image) poisoned samples. When the victim model fine-tunes on the poisoned dataset, its fused representations are steered toward a target item's 'hotspot' direction, increasing its recommendation probability — a textbook targeted data poisoning attack.

Details

Domains

multimodalnlpvision

Model Types

llmvlmmultimodal

Threat Tags

training_timetargeted

Applications

multimodal recommender systemscontent-based retrieval and ranking

Read PDF arXiv DOI

VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Associative Poisoning to Generative Machine Learning

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models