attack 2025

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Simin Li ¹, Zheng Yuwei ¹, Zihao Mao ¹, Linhao Wang ¹, Ruixiao Xu ¹, Chengdong Ma ², Xin Yu ³, Yuqing Ma ¹, Qi Dou ⁴, Xin Wang ¹, Jie Luo ¹, Bo An ⁵, Yaodong Yang ², Weifeng Lv ¹, Xianglong Liu ¹

¹ Beihang University

² Peking University

³ Chinese Academy of Sciences

⁴ The Chinese University of Hong Kong

⁵ Nanyang Technological University

0 citations

Published on arXiv

2509.15103

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

The proposed method identifies strictly more impactful vulnerable agent subsets than baselines in large-scale MARL environments, causing worse system failures and producing a learned value function that ranks individual agent vulnerability.

HAD-MFC (Hierarchical Adversarial Decentralized Mean Field Control)

Novel technique introduced

Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose compromise would most severely degrade overall performance. In this paper, we study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level involves an NP-hard combinatorial task of selecting the most vulnerable agents, and the lower level learns worst-case adversarial policies for these agents using mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To solve this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We then reformulate the upper-level combinatorial problem as a MDP with dense rewards from our regularized mean-field Bellman operator, enabling us to sequentially identify the most vulnerable agents by greedy and RL algorithms. This decomposition provably preserves the optimal solution of the original HAD-MFC. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and learns a value function that reveals the vulnerability of each agent.

Key Contributions

Formulates Vulnerable Agent Identification (VAI) as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC) problem coupling combinatorial agent selection with worst-case adversarial policy learning
Applies Fenchel-Rockafellar duality to decouple the hierarchical problem, producing a regularized mean-field Bellman operator that enables independent learning at each level with provably preserved optimality
Reformulates the NP-hard upper-level agent selection as an MDP solved by greedy and RL algorithms, yielding a value function that quantifies per-agent vulnerability

🛡️ Threat Analysis

Input Manipulation Attack

The paper proposes an adversarial attack framework (HAD-MFC) that identifies which agents to compromise and learns worst-case adversarial policies against them at deployment/inference time, causing maximal performance degradation in the MARL system — this is an adversarial evasion/manipulation attack on RL agents.

Details

Domains

reinforcement-learning

Model Types

Threat Tags

white_boxinference_timetargeted

Applications

multi-agent reinforcement learningrobot swarm controltraffic controllarge-scale distributed systems

Read PDF arXiv

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning

CAMA: Exploring Collusive Adversarial Attacks in c-MARL

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective

Vulnerability Analysis of Safe Reinforcement Learning via Inverse Constrained Reinforcement Learning

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization