attack 2025

SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning

Weiqi Guo , Guanjun Liu , Ziyuan Zhou

0 citations · 35 references · arXiv

α

Published on arXiv

2510.13262

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SAJA achieves greater team reward degradation than state-only or action-only attacks while using a smaller total perturbation budget, and successfully bypasses PAAD, ATLA, and M3DDPG defenses.

SAJA (State-Action Joint Attack)

Novel technique introduced


Multi-Agent Deep Reinforcement Learning (MADRL) has shown potential for cooperative and competitive tasks such as autonomous driving and strategic gaming. However, models trained by MADRL are vulnerable to adversarial perturbations on states and actions. Therefore, it is essential to investigate the robustness of MADRL models from an attack perspective. Existing studies focus on either state-only attacks or action-only attacks, but do not consider how to effectively joint them. Simply combining state and action perturbations such as randomly perturbing states and actions does not exploit their potential synergistic effects. In this paper, we propose the State-Action Joint Attack (SAJA) framework that has a good synergistic effects. SAJA consists of two important phases: (1) In the state attack phase, a multi-step gradient ascent method utilizes both the actor network and the critic network to compute an adversarial state, and (2) in the action attack phase, based on the perturbed state, a second gradient ascent uses the critic network to craft the final adversarial action. Additionally, a heuristic regularizer measuring the distance between the perturbed actions and the original clean ones is added into the loss function to enhance the effectiveness of the critic's guidance. We evaluate SAJA in the Multi-Agent Particle Environment (MPE), demonstrating that (1) it outperforms and is more stealthy than state-only or action-only attacks, and (2) existing state or action defense methods cannot defend its attacks.


Key Contributions

  • SAJA: a two-phase gradient-based framework that jointly perturbs states (using actor+critic networks) and actions (using critic network) to exploit synergistic adversarial effects in MADRL
  • Heuristic Loss Function (HLF) combining Q-value degradation and action-distance regularization to improve attack effectiveness beyond Q-value guidance alone
  • Empirical demonstration that SAJA outperforms state-only and action-only baselines and defeats existing MADRL defenses (PAAD, ATLA, M3DDPG) while being stealthier (lower perturbation budget for equivalent damage)

🛡️ Threat Analysis

Input Manipulation Attack

SAJA is a white-box, gradient-based adversarial attack at inference time: phase 1 uses multi-step gradient ascent on the actor-critic network to craft adversarial state perturbations (classic input manipulation), and phase 2 uses gradient ascent on the critic to craft adversarial action perturbations. Both are evasion/input-manipulation attacks causing incorrect/degraded policy outputs.


Details

Domains
reinforcement-learning
Model Types
rl
Threat Tags
white_boxinference_timetargeted
Datasets
Multi-Agent Particle Environment (MPE)
Applications
multi-agent reinforcement learningautonomous drivingstrategic gaming