attack 2025

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

2 citations · 26 references · arXiv

α

Published on arXiv

2510.09849

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Text prompt injection into images achieves high success rates against LLaVA-Next-72B with dramatically lower computational cost than gradient-based adversarial attacks

Text Prompt Injection

Novel technique introduced


The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.


Key Contributions

  • Systematic text prompt injection algorithm that embeds adversarial text within images to mislead VLMs, requiring no gradient access
  • Empirical demonstration that text prompt injection achieves high attack success rates on LLaVA-Next-72B with significantly less GPU compute than gradient-based attacks
  • Comprehensive analysis of placement and embedding techniques for injected text prompts within images

🛡️ Threat Analysis

Input Manipulation Attack

The attack crafts adversarial visual inputs (images with embedded text overlays functioning as adversarial patches) to manipulate VLM outputs at inference time — consistent with the dual-tagging rule for adversarial visual inputs to VLMs that jailbreak or manipulate their outputs.


Details

Domains
visionnlpmultimodal
Model Types
vlmllm
Threat Tags
black_boxinference_timetargeteddigital
Applications
vision-language modelsimage-grounded chatbots