SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning

Multi-Agent Deep Reinforcement Learning (MADRL) has shown potential for cooperative and competitive tasks such as autonomous driving and strategic gaming. However, models trained by MADRL are vulnerable to adversarial perturbations on states and actions. Therefore, it is essential to investigate the robustness of MADRL models from an attack perspective. Existing studies focus on either state-only attacks or action-only attacks, but do not consider how to effectively joint them. Simply combining state and action perturbations such as randomly perturbing states and actions does not exploit their potential synergistic effects. In this paper, we propose the State-Action Joint Attack (SAJA) framework that has a good synergistic effects. SAJA consists of two important phases: (1) In the state attack phase, a multi-step gradient ascent method utilizes both the actor network and the critic network to compute an adversarial state, and (2) in the action attack phase, based on the perturbed state, a second gradient ascent uses the critic network to craft the final adversarial action. Additionally, a heuristic regularizer measuring the distance between the perturbed actions and the original clean ones is added into the loss function to enhance the effectiveness of the critic's guidance. We evaluate SAJA in the Multi-Agent Particle Environment (MPE), demonstrating that (1) it outperforms and is more stealthy than state-only or action-only attacks, and (2) existing state or action defense methods cannot defend its attacks.

Key Contributions

SAJA: a two-phase gradient-based framework that jointly perturbs states (using actor+critic networks) and actions (using critic network) to exploit synergistic adversarial effects in MADRL
Heuristic Loss Function (HLF) combining Q-value degradation and action-distance regularization to improve attack effectiveness beyond Q-value guidance alone
Empirical demonstration that SAJA outperforms state-only and action-only baselines and defeats existing MADRL defenses (PAAD, ATLA, M3DDPG) while being stealthier (lower perturbation budget for equivalent damage)

🛡️ Threat Analysis

Input Manipulation Attack

SAJA is a white-box, gradient-based adversarial attack at inference time: phase 1 uses multi-step gradient ascent on the actor-critic network to craft adversarial state perturbations (classic input manipulation), and phase 2 uses gradient ascent on the critic to craft adversarial action perturbations. Both are evasion/input-manipulation attacks causing incorrect/degraded policy outputs.