attack 2026

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam , Najibul Haque Sarker , Zaber Ibn Abdul Hakim , Chris Thomas

0 citations · 45 references · arXiv

α

Published on arXiv

2601.21220

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

LAMP outperforms SOTA baselines in attack success rate across multiple multi-image vision-language tasks and MLLM architectures in a black-box setting, succeeding even when only a subset of inference images are perturbed.

LAMP

Novel technique introduced


Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model, which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning Universal Adversarial Perturbations (UAPs) targeting multi-image MLLMs. LAMP applies an attention-based constraint that prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens, spreading adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss enables a robust position-invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks and models.


Key Contributions

  • First adversarial attack explicitly targeting multi-image MLLMs, transferable across models and tasks without requiring knowledge of downstream architectures.
  • Attention-based UAP learning via Pompeiu-Hausdorff distance on self-attention weights, keeping the pre-trained MLLM frozen during optimization.
  • Novel 'contagious' cross-image objective enabling perturbed visual tokens to infect clean tokens, plus index-attention suppression loss for position-invariant attacks.

🛡️ Threat Analysis

Input Manipulation Attack

Crafts adversarial visual perturbations (UAPs) applied to image inputs of multi-image MLLMs at inference time to disrupt correct outputs — a textbook input manipulation attack. Uses Pompeiu-Hausdorff distance to target attention heads and maximize dissimilarity between clean and perturbed hidden states.


Details

Domains
visionnlpmultimodal
Model Types
vlmllmmultimodal
Threat Tags
black_boxinference_timeuntargeteddigital
Datasets
NLVR2DreamSimNeXT-QA
Applications
visual question answeringmulti-image reasoningimage comparisonvision-language tasks