attack 2025

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Changyue Li ¹, Jiaying Li ¹, Youliang Yuan ¹, Jiaming He ², Zhicong Huang ³, Pinjia He ¹

¹ The Chinese University of Hong Kong, Shenzhen

² University of Electronic Science and Technology of China

³ Ant Group

0 citations · 37 references · arXiv (Cornell University)

Published on arXiv

2511.20002

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 66% attack success rate over five distinct attacker-defined targets using a single universal adversarial perturbation frame against Qwen MLLM.

SAUP (Semantic-Aware Universal Perturbation) / SORT

Novel technique introduced

Multimodal Large Language Models (MLLMs) are increasingly deployed in stateless systems, such as autonomous driving and robotics. This paper investigates a novel threat: Semantic-Aware Hijacking. We explore the feasibility of hijacking multiple stateless decisions simultaneously using a single universal perturbation. We introduce the Semantic-Aware Universal Perturbation (SAUP), which acts as a semantic router, "actively" perceiving input semantics and routing them to distinct, attacker-defined targets. To achieve this, we conduct theoretical and empirical analysis on the geometric properties in the latent space. Guided by these insights, we propose the Semantic-Oriented (SORT) optimization strategy and annotate a new dataset with fine-grained semantics to evaluate performance. Extensive experiments on three representative MLLMs demonstrate the fundamental feasibility of this attack, achieving a 66% attack success rate over five targets using a single frame against Qwen.

Key Contributions

Semantic-Aware Universal Perturbation (SAUP) that acts as a semantic router, routing different inputs to distinct attacker-defined targets using a single universal adversarial perturbation against MLLMs
Theoretical and empirical analysis of geometric properties in the MLLM latent space to guide SAUP optimization
Semantic-Oriented (SORT) optimization strategy and a new fine-grained semantics dataset, achieving 66% attack success rate across five targets with a single frame on Qwen

🛡️ Threat Analysis

Input Manipulation Attack

SAUP is a universal adversarial visual perturbation crafted via gradient-based optimization (SORT strategy) that manipulates MLLM outputs at inference time — a classic input manipulation/adversarial example attack on vision inputs.

Details

Domains

visionmultimodalnlp

Model Types

vlmllmmultimodal

Threat Tags

white_boxinference_timetargeteddigital

Datasets

custom fine-grained semantics dataset (new, annotated by authors)

Applications

autonomous drivingroboticsmllm-based agentsmultimodal ai systems

Read PDF arXiv DOI

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Visual Memory Injection Attacks for Multi-Turn Conversations

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models

JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands