Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models
Wai Tuck Wong 1, Jun Sun 1, Arunesh Sinha 2
Published on arXiv
2603.04453
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Adversarial images optimized for numerical instability cause significant performance degradation across six VLM benchmarks with only small perturbations to the input image, representing a failure mode not captured by standard adversarial perturbation methods.
Induced Numerical Instability Attack
Novel technique introduced
The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmost importance. We study a novel mode of failure that causes degradation in performance indirectly by optimizing a loss term that seeks to maximize numerical instability in the inference stage of these models. We apply this loss term as the optimization target to construct images that, when used on multimodal large language models, cause significant degradation in the output. We validate our hypothesis on state of the art models large vision language models (LLaVa-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) against standard datasets (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO) and show that performance degrades significantly, even with a very small change to the input image, compared to baselines. Our results uncover a fundamentally different vector of performance degradation, highlighting a failure mode not captured by adversarial perturbations.
Key Contributions
- Novel numerical instability loss term that, when optimized, constructs adversarial images causing performance degradation in VLMs distinct from traditional adversarial perturbation objectives
- Empirical validation across three state-of-the-art VLMs (LLaVA-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) and six benchmarks (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO)
- Uncovers a previously understudied failure mode in multimodal LLMs tied to floating-point numerical instability at inference time rather than traditional gradient-based evasion
🛡️ Threat Analysis
Paper optimizes a loss term to construct adversarial visual inputs that cause significant model output degradation at inference time — a gradient-based adversarial image attack targeting VLMs, with small imperceptible perturbations reminiscent of classic adversarial examples but exploiting a numerical instability mechanism.