defense arXiv Dec 14, 2025 · Dec 2025
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid et al. · BRAC University
Trains LSTM, FNN, Random Forest, and Naive Bayes classifiers to detect prompt injection attacks in LLM-integrated web applications
Prompt Injection nlp
Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.
llm rnn traditional_ml BRAC University
defense arXiv Dec 18, 2025 · Dec 2025
Safwan Shaheer, G.M. Refatul Islam, Mohammad Rafid Hamid et al. · BRAC University
Defends LLaMA models from goal-hijacking via iterative CoT-seeded prompt defense generation, reducing attack success rates
Prompt Injection nlp
In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models. We introduce novel defense mechanisms capable of generating automatic defenses and systematically evaluate said generated defenses against a comprehensive set of benchmarked attacks. Thus, we empirically demonstrated the improvement proposed by our approach in mitigating goal-hijacking vulnerabilities in LLMs. Our work recognizes the increasing relevance of small open-sourced LLMs and their potential for broad deployments on edge devices, aligning with future trends in LLM applications. We contribute to the greater ecosystem of open-source LLMs and their security in the following: (1) assessing present prompt-based defenses against the latest attacks, (2) introducing a new framework using a seed defense (Chain Of Thoughts) to refine the defense prompts iteratively, and (3) showing significant improvements in detecting goal hijacking attacks. Out strategies significantly reduce the success rates of the attacks and false detection rates while at the same time effectively detecting goal-hijacking capabilities, paving the way for more secure and efficient deployments of small and open-source LLMs in resource-constrained environments.
llm transformer BRAC University