Visual Memory Injection Attacks for Multi-Turn Conversations

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

Key Contributions

Visual Memory Injection (VMI) attack: adversarial image perturbations that persist across 25+ unrelated multi-turn conversation turns in LVLMs, injecting attacker-specified target outputs triggered only by topic-related prompts
Benign anchoring technique that jointly optimizes for nominal first-turn behavior alongside malicious n-th turn output, preventing model degeneration and evading user suspicion
Context-cycling optimization that varies conversation context lengths during attack generation, enabling persistence across variable-length dialogues and transferability to unseen prompts, contexts, and fine-tuned model variants

🛡️ Threat Analysis

Input Manipulation Attack

VMI uses gradient-based adversarial perturbations applied to images to force LVLMs to output attacker-specified target messages at inference time — a direct visual input manipulation attack. The perturbations are small, imperceptible, and optimized via gradient-based methods.

Details

Domains

visionnlpmultimodal

Model Types

vlmllmmultimodal

Threat Tags

white_boxinference_timetargeteddigital

Applications

2025 0 cit.

Input Manipulation Attack

95%

Visual Memory Injection Attacks for Multi-Turn Conversations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models

Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models