benchmark 2025

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

Daniyal Ganiuly 1, Assel Smaiyl 2

0 citations · arXiv

α

Published on arXiv

2511.01634

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

GPT-4 achieved the best resilience (RDR=9.8%, SCR=96.4%) while all models remained partially vulnerable to prompt injection, with open-source models showing greater degradation.

Resilience Degradation Index / Safety Compliance Coefficient / Instructional Integrity Metric (RDI/SCC/IIM)

Novel technique introduced


Large Language Models (LLMs) are increasingly used in intelligent systems that perform reasoning, summarization, and code generation. Their ability to follow natural-language instructions, while powerful, also makes them vulnerable to a new class of attacks known as prompt injection. In these attacks, hidden or malicious instructions are inserted into user inputs or external content, causing the model to ignore its intended task or produce unsafe responses. This study proposes a unified framework for evaluating how resistant Large Language Models (LLMs) are to prompt injection attacks. The framework defines three complementary metrics such as the Resilience Degradation Index (RDI), Safety Compliance Coefficient (SCC), and Instructional Integrity Metric (IIM) to jointly measure robustness, safety, and semantic stability. We evaluated four instruction-tuned models (GPT-4, GPT-4o, LLaMA-3 8B Instruct, and Flan-T5-Large) on five common language tasks: question answering, summarization, translation, reasoning, and code generation. Results show that GPT-4 performs best overall, while open-weight models remain more vulnerable. The findings highlight that strong alignment and safety tuning are more important for resilience than model size alone. Results show that all models remain partially vulnerable, especially to indirect and direct-override attacks. GPT-4 achieved the best overall resilience (RDR = 9.8 %, SCR = 96.4 %), while open-source models exhibited higher performance degradation and lower safety scores. The findings demonstrate that alignment strength and safety tuning play a greater role in resilience than model size alone. The proposed framework offers a structured, reproducible approach for assessing model robustness and provides practical insights for improving LLM safety and reliability.


Key Contributions

  • Unified evaluation framework with three complementary metrics (RDI, SCC, IIM) measuring robustness, safety, and semantic stability under prompt injection
  • Systematic evaluation of four instruction-tuned LLMs (GPT-4, GPT-4o, LLaMA-3 8B Instruct, Flan-T5-Large) across five NLP tasks under prompt injection
  • Empirical finding that alignment strength and safety tuning matter more for prompt injection resilience than raw model size

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
question answeringsummarizationtranslationreasoningcode generation