RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches. Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS). Despite advancements, the use of LLMs in such systems further expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness. We construct an attack description dataset and use it in a targeted data poisoning attack that applies word-level, meaning-preserving perturbations to corrupt the Retrieval-Augmented Generation (RAG) knowledge base of the framework. We then compare pre-attack and post-attack mitigation responses from the target model, ChatGPT-5 Thinking, to measure the impact of the attack on model performance, using an established evaluation rubric designed for human experts and judge LLMs. Our results show that small perturbations degrade LLM performance by weakening the linkage between observed network traffic features and attack behavior, and by reducing the specificity and practicality of recommended mitigations for resource-constrained devices.

Key Contributions

Constructs an IoT attack description dataset covering 18 attack types via prompt engineering, used as RAG knowledge base content.
Proposes a transfer-based RAG poisoning attack: fine-tunes a BERT surrogate on IoT descriptions, generates semantic-preserving word-level perturbations (TextFooler + POS constraints), and injects adversarially modified documents into the RAG knowledge base.
Evaluates pre-attack vs. post-attack degradation of ChatGPT-5 Thinking mitigation quality using human expert and judge-LLM rubrics, demonstrating measurable performance drops from small perturbations.

🛡️ Threat Analysis

Data Poisoning Attack

The paper's primary contribution is a targeted data-poisoning attack that injects word-level, semantically-preserving adversarial perturbations into the RAG knowledge base of the LLM framework — directly corrupting the retrieval data to degrade downstream model outputs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxtraining_timetargeted

Datasets

custom IoT attack description dataset (18 attack types)

Applications

2026 0 cit.

Data Poisoning Attack

80%

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Subliminal Signals in Preference Labels

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence