defense 2025

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

0 citations

Published on arXiv

2509.14285

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Multi-agent pipeline reduces prompt injection Attack Success Rate from 30% (ChatGLM) and 20% (Llama2) to 0% across all 400 tested attack instances.

Multi-Agent LLM Defense Pipeline

Novel technique introduced

Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.

Key Contributions

Two complementary multi-agent defense architectures (sequential chain-of-agents and hierarchical coordinator-based) for real-time prompt injection detection and neutralization
Comprehensive evaluation dataset (HPI_ATTACK_DATASET) of 55 unique prompt injection attacks across 8 categories, totaling 400 instances on ChatGLM and Llama2
Empirical demonstration of 100% attack mitigation (ASR reduced from 30%/20% to 0%) while preserving legitimate system functionality

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

HPI_ATTACK_DATASET

Applications

llm chatbotsllm-powered applicationsautomated decision systems

Read PDF arXiv

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan

Attacks by Content: Automated Fact-checking is an AI Security Issue

LLM Reinforcement in Context

Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks

Incentive-Aligned Multi-Source LLM Summaries

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?