attack 2026

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

Wai Tuck Wong 1, Jun Sun 1, Arunesh Sinha 2

0 citations

α

Published on arXiv

2603.04453

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Adversarial images optimized for numerical instability cause significant performance degradation across six VLM benchmarks with only small perturbations to the input image, representing a failure mode not captured by standard adversarial perturbation methods.

Induced Numerical Instability Attack

Novel technique introduced


The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmost importance. We study a novel mode of failure that causes degradation in performance indirectly by optimizing a loss term that seeks to maximize numerical instability in the inference stage of these models. We apply this loss term as the optimization target to construct images that, when used on multimodal large language models, cause significant degradation in the output. We validate our hypothesis on state of the art models large vision language models (LLaVa-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) against standard datasets (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO) and show that performance degrades significantly, even with a very small change to the input image, compared to baselines. Our results uncover a fundamentally different vector of performance degradation, highlighting a failure mode not captured by adversarial perturbations.


Key Contributions

  • Novel numerical instability loss term that, when optimized, constructs adversarial images causing performance degradation in VLMs distinct from traditional adversarial perturbation objectives
  • Empirical validation across three state-of-the-art VLMs (LLaVA-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) and six benchmarks (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO)
  • Uncovers a previously understudied failure mode in multimodal LLMs tied to floating-point numerical instability at inference time rather than traditional gradient-based evasion

🛡️ Threat Analysis

Input Manipulation Attack

Paper optimizes a loss term to construct adversarial visual inputs that cause significant model output degradation at inference time — a gradient-based adversarial image attack targeting VLMs, with small imperceptible perturbations reminiscent of classic adversarial examples but exploiting a numerical instability mechanism.


Details

Domains
visionmultimodalnlp
Model Types
vlmmultimodal
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
Flickr30kMMVetTextVQAVQAv2POPECOCO
Applications
visual question answeringimage captioningmultimodal reasoning