attack 2025

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Changyue Li 1, Jiaying Li 1, Youliang Yuan 1, Jiaming He 2, Zhicong Huang 3, Pinjia He 1

0 citations · 37 references · arXiv (Cornell University)

α

Published on arXiv

2511.20002

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 66% attack success rate over five distinct attacker-defined targets using a single universal adversarial perturbation frame against Qwen MLLM.

SAUP (Semantic-Aware Universal Perturbation) / SORT

Novel technique introduced


Multimodal Large Language Models (MLLMs) are increasingly deployed in stateless systems, such as autonomous driving and robotics. This paper investigates a novel threat: Semantic-Aware Hijacking. We explore the feasibility of hijacking multiple stateless decisions simultaneously using a single universal perturbation. We introduce the Semantic-Aware Universal Perturbation (SAUP), which acts as a semantic router, "actively" perceiving input semantics and routing them to distinct, attacker-defined targets. To achieve this, we conduct theoretical and empirical analysis on the geometric properties in the latent space. Guided by these insights, we propose the Semantic-Oriented (SORT) optimization strategy and annotate a new dataset with fine-grained semantics to evaluate performance. Extensive experiments on three representative MLLMs demonstrate the fundamental feasibility of this attack, achieving a 66% attack success rate over five targets using a single frame against Qwen.


Key Contributions

  • Semantic-Aware Universal Perturbation (SAUP) that acts as a semantic router, routing different inputs to distinct attacker-defined targets using a single universal adversarial perturbation against MLLMs
  • Theoretical and empirical analysis of geometric properties in the MLLM latent space to guide SAUP optimization
  • Semantic-Oriented (SORT) optimization strategy and a new fine-grained semantics dataset, achieving 66% attack success rate across five targets with a single frame on Qwen

🛡️ Threat Analysis

Input Manipulation Attack

SAUP is a universal adversarial visual perturbation crafted via gradient-based optimization (SORT strategy) that manipulates MLLM outputs at inference time — a classic input manipulation/adversarial example attack on vision inputs.


Details

Domains
visionmultimodalnlp
Model Types
vlmllmmultimodal
Threat Tags
white_boxinference_timetargeteddigital
Datasets
custom fine-grained semantics dataset (new, annotated by authors)
Applications
autonomous drivingroboticsmllm-based agentsmultimodal ai systems