RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
Seif Ikbarieh 1, Kshitiz Aryal 2, Maanak Gupta 1
Published on arXiv
2511.06212
Data Poisoning Attack
OWASP ML Top 10 — ML02
Training Data Poisoning
OWASP LLM Top 10 — LLM03
Key Finding
Small meaning-preserving word-level perturbations injected into a RAG knowledge base measurably degrade an LLM's (ChatGPT-5 Thinking) mitigation specificity by weakening linkage between observed network traffic features and attack behavior.
TextFooler-based RAG Poisoning
Novel technique introduced
The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches. Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS). Despite advancements, the use of LLMs in such systems further expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness. We construct an attack description dataset and use it in a targeted data poisoning attack that applies word-level, meaning-preserving perturbations to corrupt the Retrieval-Augmented Generation (RAG) knowledge base of the framework. We then compare pre-attack and post-attack mitigation responses from the target model, ChatGPT-5 Thinking, to measure the impact of the attack on model performance, using an established evaluation rubric designed for human experts and judge LLMs. Our results show that small perturbations degrade LLM performance by weakening the linkage between observed network traffic features and attack behavior, and by reducing the specificity and practicality of recommended mitigations for resource-constrained devices.
Key Contributions
- Constructs an IoT attack description dataset covering 18 attack types via prompt engineering, used as RAG knowledge base content.
- Proposes a transfer-based RAG poisoning attack: fine-tunes a BERT surrogate on IoT descriptions, generates semantic-preserving word-level perturbations (TextFooler + POS constraints), and injects adversarially modified documents into the RAG knowledge base.
- Evaluates pre-attack vs. post-attack degradation of ChatGPT-5 Thinking mitigation quality using human expert and judge-LLM rubrics, demonstrating measurable performance drops from small perturbations.
🛡️ Threat Analysis
The paper's primary contribution is a targeted data-poisoning attack that injects word-level, semantically-preserving adversarial perturbations into the RAG knowledge base of the LLM framework — directly corrupting the retrieval data to degrade downstream model outputs.