Advantage-based Temporal Attack in Reinforcement Learning
Published on arXiv
2602.19582
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
AAT matches or surpasses mainstream adversarial attack baselines across Atari, DeepMind Control Suite, and Google Football by leveraging stronger temporal correlations between sequential perturbations.
Advantage-based Adversarial Transformer (AAT)
Novel technique introduced
Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.
Key Contributions
- Multi-scale causal self-attention (MSCSA) mechanism that captures temporal dependencies across historical time steps to produce temporally correlated adversarial perturbations against DRL agents.
- Weighted advantage mechanism that quantifies perturbation effectiveness in a given state and guides sampling toward high-advantage adversarial regions.
- Empirical evaluation showing AAT matches or surpasses reward-based and gradient-based attack baselines on Atari, DeepMind Control Suite, and Google Football.
🛡️ Threat Analysis
Core contribution is crafting adversarial input perturbations at inference time that mislead DRL agents into suboptimal or unsafe actions — a classic input manipulation attack applied to sequential decision-making settings.