attack 2026

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen 1,2, Simin Huang 1, Jiawei Du 3, Shuaihang Chen 2,4, Yu Tian 5, Mingjie Wei 2,4, Chao Yu 5, Zhaoxia Yin 1

0 citations

α

Published on arXiv

2604.01618

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves up to 96.7% task failure rate on VLA manipulation tasks in both simulation and real-robot settings

Tex3D

Novel technique introduced


Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.


Key Contributions

  • Foreground-Background Decoupling (FBD) enabling differentiable 3D texture optimization in VLA simulation environments
  • Trajectory-Aware Adversarial Optimization (TAAO) for viewpoint-robust attacks across long-horizon manipulation tasks
  • First framework for end-to-end optimization of physically realizable 3D adversarial textures against VLA models

🛡️ Threat Analysis

Input Manipulation Attack

Creates adversarial 3D textures that cause VLA models to output incorrect actions at inference time. The attack manipulates visual inputs (object textures) to induce misclassification/misbehavior. While 3D rather than 2D perturbations, this is fundamentally an input manipulation attack causing incorrect outputs during inference.


Details

Domains
visionmultimodalnlp
Model Types
vlmmultimodaltransformer
Threat Tags
white_boxinference_timeuntargetedphysicaldigital
Applications
robotic manipulationvision-language-action modelsembodied ai