Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization
Hui Lu 1, Yi Yu 1, Yiming Yang 1, Chenyu Yi 1, Xueyi Ke 1, Qixing Zhang 1, Bingquan Shen 2, Alex Kot 1, Xudong Jiang 1
Published on arXiv
2601.23179
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves +23.7% unseen-image attack success rate on GPT-4o and +19.9% on Gemini-2.0 over the strongest universal adversarial baseline
MCRMO-Attack
Novel technique introduced
Targeted adversarial attacks on closed-source multimodal large language models (MLLMs) have been increasingly explored under black-box transfer, yet prior methods are predominantly sample-specific and offer limited reusability across inputs. We instead study a more stringent setting, Universal Targeted Transferable Adversarial Attacks (UTTAA), where a single perturbation must consistently steer arbitrary inputs toward a specified target across unknown commercial MLLMs. Naively adapting existing sample-wise attacks to this universal setting faces three core difficulties: (i) target supervision becomes high-variance due to target-crop randomness, (ii) token-wise matching is unreliable because universality suppresses image-specific cues that would otherwise anchor alignment, and (iii) few-source per-target adaptation is highly initialization-sensitive, which can degrade the attainable performance. In this work, we propose MCRMO-Attack, which stabilizes supervision via Multi-Crop Aggregation with an Attention-Guided Crop, improves token-level reliability through alignability-gated Token Routing, and meta-learns a cross-target perturbation prior that yields stronger per-target solutions. Across commercial MLLMs, we boost unseen-image attack success rate by +23.7\% on GPT-4o and +19.9\% on Gemini-2.0 over the strongest universal baseline.
Key Contributions
- Defines the UTTAA (Universal Targeted Transferable Adversarial Attacks) setting: a single perturbation steers arbitrary unseen inputs toward a specified target on closed-source MLLMs
- Multi-Crop Aggregation with Attention-Guided Crop (MCA+AGC) to reduce target supervision variance from crop randomness
- Alignability-gated Token Routing for reliable token-level matching and meta-learned cross-target perturbation prior for initialization-robust few-shot adaptation
🛡️ Threat Analysis
Proposes gradient-based universal adversarial perturbations applied to images that cause targeted misclassification/incorrect outputs in VLMs at inference time — a direct input manipulation attack.