ML Security Papers

Latest papers

5 papers

attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp

PDF Code

Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.

vlm multimodal transformer East China Normal University · Zhongguancun Academy · A*STAR +2 more

PDF arXiv Code

defense arXiv Dec 20, 2025 · Dec 2025

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

Yucheng Fan, Jiawei Chen, Yu Tian et al. · East China Normal University · Zhongguancun Academy +1 more

Adversarial image perturbations shield social-media photos from VLM-based private attribute inference while preserving visual quality

Input Manipulation Attack visionmultimodal

PDF

defense arXiv Nov 9, 2025 · Nov 2025

KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

Shuyuan Liu, Jiawei Chen, Xiao Yang et al. · East China Normal University · Zhongguancun Academy +1 more

Knowledge graph-based black-box defense that detects jailbreak intent via semantic parsing without accessing LLM internals

Prompt Injection nlp

PDF

defense arXiv Aug 18, 2025 · Aug 2025

RAJ-PGA: Reasoning-Activated Jailbreak and Principle-Guided Alignment Framework for Large Reasoning Models

Jianhao Chen, Mayi Xu, Haoyang Chen et al. · Wuhan University · Zhongguancun Academy +2 more

Jailbreaks Large Reasoning Models via prompt concretization targeting CoT reasoning, then builds a safety alignment dataset that improves defense by 29.5%

Prompt Injection nlp

PDF Code

attack arXiv Aug 12, 2025 · Aug 2025

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Shixuan Sun, Siyuan Liang, Ruoyu Chen et al. · Sun Yat-Sen University · University of Chinese Academy of Sciences +3 more

Source-aware membership inference audit for RAG/MRAG systems attributing outputs to training data, retrieval, or user input via zero-order optimization

Membership Inference Attack Sensitive Information Disclosure nlpmultimodal

PDF

Latest papers

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

RAJ-PGA: Reasoning-Activated Jailbreak and Principle-Guided Alignment Framework for Large Reasoning Models

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue