Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmost importance. We study a novel mode of failure that causes degradation in performance indirectly by optimizing a loss term that seeks to maximize numerical instability in the inference stage of these models. We apply this loss term as the optimization target to construct images that, when used on multimodal large language models, cause significant degradation in the output. We validate our hypothesis on state of the art models large vision language models (LLaVa-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) against standard datasets (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO) and show that performance degrades significantly, even with a very small change to the input image, compared to baselines. Our results uncover a fundamentally different vector of performance degradation, highlighting a failure mode not captured by adversarial perturbations.

Key Contributions

Novel numerical instability loss term that, when optimized, constructs adversarial images causing performance degradation in VLMs distinct from traditional adversarial perturbation objectives
Empirical validation across three state-of-the-art VLMs (LLaVA-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) and six benchmarks (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO)
Uncovers a previously understudied failure mode in multimodal LLMs tied to floating-point numerical instability at inference time rather than traditional gradient-based evasion

🛡️ Threat Analysis

Input Manipulation Attack

Paper optimizes a loss term to construct adversarial visual inputs that cause significant model output degradation at inference time — a gradient-based adversarial image attack targeting VLMs, with small imperceptible perturbations reminiscent of classic adversarial examples but exploiting a numerical instability mechanism.

Details

Domains

visionmultimodalnlp

Model Types

vlmmultimodal

Threat Tags

white_boxinference_timeuntargeteddigital

Datasets

Flickr30kMMVetTextVQAVQAv2POPECOCO

Applications

2026 0 cit.

Input Manipulation Attack

86%

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Adversarial attacks against Modern Vision-Language Models

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks