attack 2025

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Tairan Huang , Yulin Jin , Junxu Liu , Qingqing Ye , Haibo Hu

0 citations · arXiv

α

Published on arXiv

2511.09681

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SEBA significantly reduces cumulative rewards in visual RL agents while maintaining visual fidelity and greatly decreasing environment interactions compared to prior black-box and white-box adversarial attack methods.

SEBA

Novel technique introduced


Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.


Key Contributions

  • Shadow Q model that estimates cumulative rewards under adversarial conditions to guide black-box attacks without direct model access
  • GAN-based perturbation generator combined with a world model to simulate environment dynamics and dramatically reduce real-world environment queries
  • Two-stage iterative training procedure alternating between shadow model learning and generator refinement, achieving competitive results with far fewer interactions than prior white-box and black-box methods

🛡️ Threat Analysis

Input Manipulation Attack

SEBA crafts visually imperceptible adversarial perturbations on image inputs to degrade RL agent performance at inference time — a classic input manipulation/evasion attack. Uses a GAN to generate perturbations and a shadow Q-model to guide the attack in a black-box setting without access to model gradients.


Details

Domains
visionreinforcement-learning
Model Types
rlgancnn
Threat Tags
black_boxinference_timeuntargeteddigital
Datasets
MuJoCoAtari
Applications
visual controlroboticsreinforcement learning agents