defense 2025

Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer , G.M. Refatul Islam , Mohammad Rafid Hamid , Md. Abrar Faiaz Khan , Md. Omar Faruk , Yaseen Nur

0 citations · 14 references · arXiv

α

Published on arXiv

2512.12583

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Multiple classifiers trained on a curated prompt injection dataset improve detection of malicious prompts in LLM-integrated applications, though specific quantitative results are not reported in the available text.


Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.


Key Contributions

  • Curated and augmented a prompt injection dataset derived from the HackAPrompt-Playground-Submissions corpus on HuggingFace
  • Trained and compared multiple classifiers (LSTM, FNN, Random Forest, Naive Bayes) for detecting malicious prompts
  • Proposed a detection and mitigation pipeline for prompt injection in LLM-integrated web applications

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmrnntraditional_ml
Threat Tags
inference_timeblack_box
Datasets
HackAPrompt-Playground-Submissions
Applications
llm-integrated web applicationschatbot systems