defense 2025

Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models

W.K.M Mithsara 1, Ning Yang 1, Ahmed Imteaj 2, Hussein Zangoti 3, Abdur R. Shahid 1

0 citations · 66 references · arXiv

α

Published on arXiv

2511.02894

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

LLMs using role-play and chain-of-thought prompting can detect and sanitize data poisoning in HAR systems without large labeled training datasets, demonstrating practical viability via accuracy, latency, and communication cost metrics.


The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.


Key Contributions

  • LLM-based zero/one/few-shot framework for detecting and sanitizing data poisoning in HAR sensor streams without task-specific labeled datasets
  • Role-play prompting combined with chain-of-thought reasoning to guide LLMs in identifying poisoning indicators and inferring clean sensor values
  • End-to-end evaluation covering detection accuracy, sanitization quality, latency, and communication cost in a wearable IoT context

🛡️ Threat Analysis

Data Poisoning Attack

The paper's primary focus is detecting and sanitizing data poisoning attacks on ML-based HAR systems in wearable IoT; the entire framework is a defense against training-data corruption.


Details

Domains
timeseriesnlp
Model Types
llmtransformertraditional_ml
Threat Tags
training_time
Applications
human activity recognitionwearable iothealthcare sensing