Text Prompt Injection of Vision Language Models
Published on arXiv
2510.09849
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Text prompt injection into images achieves high success rates against LLaVA-Next-72B with dramatically lower computational cost than gradient-based adversarial attacks
Text Prompt Injection
Novel technique introduced
The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.
Key Contributions
- Systematic text prompt injection algorithm that embeds adversarial text within images to mislead VLMs, requiring no gradient access
- Empirical demonstration that text prompt injection achieves high attack success rates on LLaVA-Next-72B with significantly less GPU compute than gradient-based attacks
- Comprehensive analysis of placement and embedding techniques for injected text prompts within images
🛡️ Threat Analysis
The attack crafts adversarial visual inputs (images with embedded text overlays functioning as adversarial patches) to manipulate VLM outputs at inference time — consistent with the dual-tagging rule for adversarial visual inputs to VLMs that jailbreak or manipulate their outputs.