attack 2026

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

Junxian Li , Tu Lan , Haozhen Tan , Yan Meng , Haojin Zhu

0 citations

α

Published on arXiv

2603.08316

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

SlowBA significantly increases response length and latency while preserving task accuracy, remaining effective at small poisoning ratios and under several defense settings.

SlowBA (Reward-level Backdoor Injection / RBI)

Novel technique introduced


Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.


Key Contributions

  • SlowBA: a backdoor attack that targets response efficiency (latency) rather than action correctness in VLM-based GUI agents
  • Two-stage reward-level backdoor injection (RBI): stage 1 aligns long-response format, stage 2 uses reinforcement learning to bind that behavior to specific trigger patterns
  • Realistic pop-up window triggers designed to blend naturally into GUI environments, improving attack stealthiness and real-world plausibility

🛡️ Threat Analysis

Model Poisoning

SlowBA injects hidden backdoor behavior into VLM-based GUI agents: under clean inputs the model behaves normally, but specific trigger patterns (realistic pop-up windows) activate excessively long reasoning chains and high latency. The two-stage reward-level backdoor injection (RBI) strategy is a textbook ML10 attack — trigger-based targeted behavior modification via training-time poisoning.


Details

Domains
multimodalvisionnlp
Model Types
vlmrl
Threat Tags
training_timetargeteddigital
Datasets
ScreenSpotAndroidWorldMind2Web
Applications
gui agentsvlm-based task automationdesktop/mobile ui interaction