benchmark 2025

Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs

Andrew Yeo 1, Daeseon Choi 2

0 citations

α

Published on arXiv

2509.05883

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

All eight commercial LLMs were exploitable via at least one prompt injection category when relying solely on built-in safeguards, with Claude 3 showing relatively greater but still insufficient robustness.


Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like responses, leading to their integration across diverse domains, including consulting and information retrieval. However, their widespread deployment also introduces substantial security risks, most notably in the form of prompt injection and jailbreak attacks. To systematically evaluate LLM vulnerabilities -- particularly to external prompt injection -- we conducted a series of experiments on eight commercial models. Each model was tested without supplementary sanitization, relying solely on its built-in safeguards. The results exposed exploitable weaknesses and emphasized the need for stronger security measures. Four categories of attacks were examined: direct injection, indirect (external) injection, image-based injection, and prompt leakage. Comparative analysis indicated that Claude 3 demonstrated relatively greater robustness; nevertheless, empirical findings confirm that additional defenses, such as input normalization, remain necessary to achieve reliable protection.


Key Contributions

  • Empirical evaluation of eight commercial LLMs against four prompt injection categories (direct, indirect, image-based, prompt leakage) without supplementary sanitization
  • Structured taxonomy classifying prompt injection techniques by objective and delivery vector
  • Comparative robustness analysis finding Claude 3 relatively more resistant, while confirming all tested models remain exploitable

🛡️ Threat Analysis


Details

Domains
nlpmultimodal
Model Types
llmvlm
Threat Tags
black_boxinference_time
Applications
llm-based consultinginformation retrievalhealthcare systemsenterprise llm deployments