benchmark 2025

Adversarial Robustness of Vision in Open Foundation Models

Jonathon Fox , William J Buchanan , Pavlos Papadopoulos

0 citations · 68 references · IEEE Access

α

Published on arXiv

2512.17902

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Llama 3.2 Vision shows smaller accuracy drop under PGD attack than LLaVA at higher perturbation levels, despite having lower baseline VQA accuracy, suggesting robustness is architecturally determined rather than accuracy-correlated

PGD (Projected Gradient Descent)

Novel technique introduced


With the increase in deep learning, it becomes increasingly difficult to understand the model in which AI systems can identify objects. Thus, an adversary could aim to modify an image by adding unseen elements, which will confuse the AI in its recognition of an entity. This paper thus investigates the adversarial robustness of LLaVA-1.5-13B and Meta's Llama 3.2 Vision-8B-2. These are tested for untargeted PGD (Projected Gradient Descent) against the visual input modality, and empirically evaluated on the Visual Question Answering (VQA) v2 dataset subset. The results of these adversarial attacks are then quantified using the standard VQA accuracy metric. This evaluation is then compared with the accuracy degradation (accuracy drop) of LLaVA and Llama 3.2 Vision. A key finding is that Llama 3.2 Vision, despite a lower baseline accuracy in this setup, exhibited a smaller drop in performance under attack compared to LLaVA, particularly at higher perturbation levels. Overall, the findings confirm that the vision modality represents a viable attack vector for degrading the performance of contemporary open-weight VLMs, including Meta's Llama 3.2 Vision. Furthermore, they highlight that adversarial robustness does not necessarily correlate directly with standard benchmark performance and may be influenced by underlying architectural and training factors.


Key Contributions

  • Empirical comparison of PGD adversarial robustness between LLaVA-1.5-13B and Llama 3.2 Vision-8B on VQA v2
  • Finding that adversarial robustness does not correlate with standard benchmark accuracy in open-weight VLMs
  • Evidence that Llama 3.2 Vision maintains relatively higher robustness than LLaVA at higher perturbation budgets despite lower baseline accuracy

🛡️ Threat Analysis

Input Manipulation Attack

Applies PGD gradient-based adversarial perturbations to visual inputs of VLMs at inference time, causing accuracy degradation — classic input manipulation / adversarial example attack.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
VQA v2
Applications
visual question answeringvision-language models