attack 2025

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

Qizhou Peng , Yang Zheng , Yu Wen , Yanna Wu , Yingying Du

0 citations · 51 references · arXiv

α

Published on arXiv

2510.10937

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Neutral agents indirectly mislead well-trained victim DRL agents through shared environment manipulation without direct interaction, achieving effective adversarial attacks across both cooperative (SMAC) and competitive (Highway-env) settings.

Neutral Agent-based Adversarial Policy Learning

Novel technique introduced


Reinforcement learning (RL) has been an important machine learning paradigm for solving long-horizon sequential decision-making problems under uncertainty. By integrating deep neural networks (DNNs) into the RL framework, deep reinforcement learning (DRL) has emerged, which achieved significant success in various domains. However, the integration of DNNs also makes it vulnerable to adversarial attacks. Existing adversarial attack techniques mainly focus on either directly manipulating the environment with which a victim agent interacts or deploying an adversarial agent that interacts with the victim agent to induce abnormal behaviors. While these techniques achieve promising results, their adoption in multi-party open systems remains limited due to two major reasons: impractical assumption of full control over the environment and dependent on interactions with victim agents. To enable adversarial attacks in multi-party open systems, in this paper, we redesigned an adversarial policy learning approach that can mislead well-trained victim agents without requiring direct interactions with these agents or full control over their environments. Particularly, we propose a neutral agent-based approach across various task scenarios in multi-party open systems. While the neutral agents seemingly are detached from the victim agents, indirectly influence them through the shared environment. We evaluate our proposed method on the SMAC platform based on Starcraft II and the autonomous driving simulation platform Highway-env. The experimental results demonstrate that our method can launch general and effective adversarial attacks in multi-party open systems.


Key Contributions

  • Neutral agent-based adversarial policy learning that does not require direct interaction with victim agents or full control over their environment
  • Formalization of adversarial attacks in multi-party open systems where traditional adversarial assumptions are relaxed
  • Empirical validation on SMAC (StarCraft II) and Highway-env showing effective attacks without victim agent contact

🛡️ Threat Analysis

Input Manipulation Attack

Adversarial policy attack that crafts adversarial agent behavior to manipulate the observations/inputs the victim DRL agent receives through the shared environment, inducing misclassification/abnormal behavior at inference time — this is input manipulation via environmental observations, the RL analogue of adversarial examples.


Details

Domains
reinforcement-learning
Model Types
rl
Threat Tags
black_boxinference_time
Datasets
SMAC (StarCraft Multi-Agent Challenge)Highway-env
Applications
multi-agent reinforcement learningautonomous driving simulationcooperative game environments