Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs
Andrew Yeo 1, Daeseon Choi 2
Published on arXiv
2509.05883
Prompt Injection
OWASP LLM Top 10 — LLM01
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
All eight commercial LLMs were exploitable via at least one prompt injection category when relying solely on built-in safeguards, with Claude 3 showing relatively greater but still insufficient robustness.
Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like responses, leading to their integration across diverse domains, including consulting and information retrieval. However, their widespread deployment also introduces substantial security risks, most notably in the form of prompt injection and jailbreak attacks. To systematically evaluate LLM vulnerabilities -- particularly to external prompt injection -- we conducted a series of experiments on eight commercial models. Each model was tested without supplementary sanitization, relying solely on its built-in safeguards. The results exposed exploitable weaknesses and emphasized the need for stronger security measures. Four categories of attacks were examined: direct injection, indirect (external) injection, image-based injection, and prompt leakage. Comparative analysis indicated that Claude 3 demonstrated relatively greater robustness; nevertheless, empirical findings confirm that additional defenses, such as input normalization, remain necessary to achieve reliable protection.
Key Contributions
- Empirical evaluation of eight commercial LLMs against four prompt injection categories (direct, indirect, image-based, prompt leakage) without supplementary sanitization
- Structured taxonomy classifying prompt injection techniques by objective and delivery vector
- Comparative robustness analysis finding Claude 3 relatively more resistant, while confirming all tested models remain exploitable