ML Security Papers

defense 2025

Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer , G.M. Refatul Islam , Mohammad Rafid Hamid , Md. Abrar Faiaz Khan , Md. Omar Faruk , Yaseen Nur

BRAC University

0 citations · 14 references · arXiv

α

Published on arXiv

2512.12583

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Multiple classifiers trained on a curated prompt injection dataset improve detection of malicious prompts in LLM-integrated applications, though specific quantitative results are not reported in the available text.

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.

Key Contributions

Curated and augmented a prompt injection dataset derived from the HackAPrompt-Playground-Submissions corpus on HuggingFace
Trained and compared multiple classifiers (LSTM, FNN, Random Forest, Naive Bayes) for detecting malicious prompts
Proposed a detection and mitigation pipeline for prompt injection in LLM-integrated web applications

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmrnntraditional_ml

Threat Tags

inference_timeblack_box

Datasets

HackAPrompt-Playground-Submissions

Applications

llm-integrated web applicationschatbot systems

Read PDF arXiv DOI

Similar Papers

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema

Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?