defense 2025

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Tavish McDonald ¹, Bo Lei ¹, Stanislav Fort ², Bhavya Kailkhura ¹, Brian Bartoldson ¹

¹ Lawrence Livermore National Laboratory

² Independent Researcher

0 citations · 30 references · arXiv

Published on arXiv

2510.06790

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Robustifying the vision encoder of InternVL 3.5 gpt-oss 20B before scaling test-time compute yields significant adversarial accuracy gains on Attack-Bard, whereas scaling compute alone on the non-robust base model provides little benefit.

RICH (Robustness from Inference Compute Hypothesis)

Novel technique introduced

Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba et al. (2025) make progress on this problem at test time, showing LLM reasoning improves satisfaction of model specifications designed to thwart attacks, resulting in a correlation between reasoning effort and robustness to jailbreaks. However, this benefit of test compute fades when attackers are given access to gradients or multimodal inputs. We address this gap, clarifying that inference-compute offers benefits even in such cases. Our approach argues that compositional generalization, through which OOD data is understandable via its in-distribution (ID) components, enables adherence to defensive specifications on adversarially OOD inputs. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the attacked data's components. We empirically support this hypothesis across vision language model and attack types, finding robustness gains from test-time compute if specification following on OOD data is unlocked by compositional generalization. For example, InternVL 3.5 gpt-oss 20B gains little robustness when its test compute is scaled, but such scaling adds significant robustness if we first robustify its vision encoder. This correlation of inference-compute's robustness benefit with base model robustness is the rich-get-richer dynamic of the RICH: attacked data components are more ID for robustified models, aiding compositional generalization to OOD data. Thus, we advise layering train-time and test-time defenses to obtain their synergistic benefit.

Key Contributions

Proposes the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses are more effective when model training data better reflects the components of attacked data, enabling compositional generalization to adversarially OOD inputs.
Empirically demonstrates that robustifying the vision encoder (e.g., via adversarial fine-tuning of ViT in InternVL 3.5 gpt-oss 20B) unlocks significant robustness gains from inference-time compute scaling against gradient-based visual attacks.
Advises layering train-time adversarial robustification with test-time compute scaling (CoT/reasoning) to obtain synergistic robustness benefits beyond either approach alone.

🛡️ Threat Analysis

Input Manipulation Attack

The paper directly studies gradient-based adversarial attacks (white-box and transfer) on VLMs and proposes inference-time compute scaling as a defense against these input manipulation attacks; adversarial visual perturbations are the primary attack vector evaluated.

Details

Domains

visionmultimodalnlp

Model Types

vlmtransformer

Threat Tags

white_boxblack_boxinference_timetargeteddigital

Datasets

Attack-Bard

Applications

vision language modelsadversarial robustnessautonomous driving

Read PDF arXiv DOI

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

CaptionFool: Universal Image Captioning Model Attacks

Randomized Smoothing Meets Vision-Language Models

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models