defense 2025

Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space

Kiarash Kazari , Ezzeldin Shereen , György Dán

0 citations · European Conference on Artific...

α

Published on arXiv

2508.15764

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves AUC-ROC scores above 0.95 against the most impactful adversarial attacks across all evaluated PettingZoo multi-agent environments.

CUSUM-based Gaussian Normality Detector

Novel technique introduced


We address the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning with continuous action space. We propose a decentralized detector that relies solely on the local observations of the agents and makes use of a statistical characterization of the normal behavior of observable agents. The proposed detector utilizes deep neural networks to approximate the normal behavior of agents as parametric multivariate Gaussian distributions. Based on the predicted density functions, we define a normality score and provide a characterization of its mean and variance. This characterization allows us to employ a two-sided CUSUM procedure for detecting deviations of the normality score from its mean, serving as a detector of anomalous behavior in real-time. We evaluate our scheme on various multi-agent PettingZoo benchmarks against different state-of-the-art attack methods, and our results demonstrate the effectiveness of our method in detecting impactful adversarial attacks. Particularly, it outperforms the discrete counterpart by achieving AUC-ROC scores of over 0.95 against the most impactful attacks in all evaluated environments.


Key Contributions

  • Decentralized detection scheme that models each observable agent's continuous-action behavior as a parameterized multivariate Gaussian using deep neural networks
  • Analytical characterization of the normality score's mean and variance, enabling attack detection to be cast as a mean-shift detection problem
  • Two-sided CUSUM procedure applied to the normality score for real-time anomaly detection, avoiding the exponential complexity of discretizing continuous action spaces

🛡️ Threat Analysis

Input Manipulation Attack

Defends against inference-time adversarial manipulation attacks on RL agents — attackers perturb agent observations or actions to degrade team reward; the proposed method detects these input/action manipulations using statistical normality scoring and sequential change-point detection.


Details

Domains
reinforcement-learning
Model Types
rl
Threat Tags
inference_timeblack_boxtargeted
Datasets
PettingZoo
Applications
cooperative multi-agent reinforcement learningroboticssmart grid controlautonomous systems