attack arXiv Oct 10, 2025 · Oct 2025
Ruizhe Zhu · ETH Zürich
Embeds readable text instructions inside images to hijack VLM behavior, outperforming gradient-based attacks with far less compute
Input Manipulation Attack Prompt Injection visionnlpmultimodal
The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.
vlm llm ETH Zürich