defense 2026

dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs

Shriram KS Pandian , Naresh Kshetri

0 citations · 13 references · International Conference on Ap...

α

Published on arXiv

2601.01289

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Proposes a 4-stage DWBC algorithm combining weighted hash verification with robust offline RL to mitigate data poisoning attacks, with preliminary evaluation on MuJoCo HalfCheetah.

DWBC (Density-ratio Weighted Behavioral Cloning)

Novel technique introduced


Data poisoning attacks (DPAs) are becoming popular as artificial intelligence (AI) algorithms, machine learning (ML) algorithms, and deep learning (DL) algorithms in this artificial intelligence (AI) era. Hackers and penetration testers are excessively injecting malicious contents in the training data (and in testing data too) that leads to false results that are very hard to inspect and predict. We have analyzed several recent technologies used (from deep reinforcement learning to federated learning) for the DPAs and their safety, security, & countermeasures. The problem setup along with the problem estimation is shown in the MuJoCo environment with performance of HalfCheetah before the dataset is poisoned and after the dataset is poisoned. We have analyzed several risks associated with the DPAs and falsification in medical data from popular poisoning data attacks to some popular data defenses. We have proposed robust offline reinforcement learning (Offline RL) for the safety and reliability with weighted hash verification along with density-ratio weighted behavioral cloning (DWBC) algorithm. The four stages of the proposed algorithm (as the Stage 0, the Stage 1, the Stage 2, and the Stage 3) are described with respect to offline RL, safety, and security for DPAs. The conclusion and future scope are provided with the intent to combine DWBC with other data defense strategies to counter and protect future contamination cyberattacks.


Key Contributions

  • Survey and analysis of data poisoning attacks across deep RL and federated learning contexts
  • Proposed 4-stage robust offline RL framework using density-ratio weighted behavioral cloning (DWBC) with weighted hash verification as a defense against DPAs
  • Empirical demonstration of poisoning impact on MuJoCo HalfCheetah environment before and after dataset poisoning

🛡️ Threat Analysis

Data Poisoning Attack

The paper's central topic is data poisoning attacks (DPAs) — malicious injection into training data causing false outcomes — and it proposes a defense (robust offline RL + DWBC with weighted hash verification) specifically targeting this threat.


Details

Domains
reinforcement-learningfederated-learning
Model Types
rlfederated
Threat Tags
training_time
Datasets
MuJoCo HalfCheetah
Applications
offline reinforcement learningmedical data integrity