Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation
Changyue Li 1, Jiaying Li 1, Youliang Yuan 1, Jiaming He 2, Zhicong Huang 3, Pinjia He 1
1 The Chinese University of Hong Kong, Shenzhen
Published on arXiv
2511.20002
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves 66% attack success rate over five distinct attacker-defined targets using a single universal adversarial perturbation frame against Qwen MLLM.
SAUP (Semantic-Aware Universal Perturbation) / SORT
Novel technique introduced
Multimodal Large Language Models (MLLMs) are increasingly deployed in stateless systems, such as autonomous driving and robotics. This paper investigates a novel threat: Semantic-Aware Hijacking. We explore the feasibility of hijacking multiple stateless decisions simultaneously using a single universal perturbation. We introduce the Semantic-Aware Universal Perturbation (SAUP), which acts as a semantic router, "actively" perceiving input semantics and routing them to distinct, attacker-defined targets. To achieve this, we conduct theoretical and empirical analysis on the geometric properties in the latent space. Guided by these insights, we propose the Semantic-Oriented (SORT) optimization strategy and annotate a new dataset with fine-grained semantics to evaluate performance. Extensive experiments on three representative MLLMs demonstrate the fundamental feasibility of this attack, achieving a 66% attack success rate over five targets using a single frame against Qwen.
Key Contributions
- Semantic-Aware Universal Perturbation (SAUP) that acts as a semantic router, routing different inputs to distinct attacker-defined targets using a single universal adversarial perturbation against MLLMs
- Theoretical and empirical analysis of geometric properties in the MLLM latent space to guide SAUP optimization
- Semantic-Oriented (SORT) optimization strategy and a new fine-grained semantics dataset, achieving 66% attack success rate across five targets with a single frame on Qwen
🛡️ Threat Analysis
SAUP is a universal adversarial visual perturbation crafted via gradient-based optimization (SORT strategy) that manipulates MLLM outputs at inference time — a classic input manipulation/adversarial example attack on vision inputs.