Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models

The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.

Key Contributions

LLM-based zero/one/few-shot framework for detecting and sanitizing data poisoning in HAR sensor streams without task-specific labeled datasets
Role-play prompting combined with chain-of-thought reasoning to guide LLMs in identifying poisoning indicators and inferring clean sensor values
End-to-end evaluation covering detection accuracy, sanitization quality, latency, and communication cost in a wearable IoT context

🛡️ Threat Analysis

Data Poisoning Attack

The paper's primary focus is detecting and sanitizing data poisoning attacks on ML-based HAR systems in wearable IoT; the entire framework is a defense against training-data corruption.