Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Tavish McDonald 1, Bo Lei 1, Stanislav Fort 2, Bhavya Kailkhura 1, Brian Bartoldson 1
Published on arXiv
2510.06790
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Robustifying the vision encoder of InternVL 3.5 gpt-oss 20B before scaling test-time compute yields significant adversarial accuracy gains on Attack-Bard, whereas scaling compute alone on the non-robust base model provides little benefit.
RICH (Robustness from Inference Compute Hypothesis)
Novel technique introduced
Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba et al. (2025) make progress on this problem at test time, showing LLM reasoning improves satisfaction of model specifications designed to thwart attacks, resulting in a correlation between reasoning effort and robustness to jailbreaks. However, this benefit of test compute fades when attackers are given access to gradients or multimodal inputs. We address this gap, clarifying that inference-compute offers benefits even in such cases. Our approach argues that compositional generalization, through which OOD data is understandable via its in-distribution (ID) components, enables adherence to defensive specifications on adversarially OOD inputs. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the attacked data's components. We empirically support this hypothesis across vision language model and attack types, finding robustness gains from test-time compute if specification following on OOD data is unlocked by compositional generalization. For example, InternVL 3.5 gpt-oss 20B gains little robustness when its test compute is scaled, but such scaling adds significant robustness if we first robustify its vision encoder. This correlation of inference-compute's robustness benefit with base model robustness is the rich-get-richer dynamic of the RICH: attacked data components are more ID for robustified models, aiding compositional generalization to OOD data. Thus, we advise layering train-time and test-time defenses to obtain their synergistic benefit.
Key Contributions
- Proposes the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses are more effective when model training data better reflects the components of attacked data, enabling compositional generalization to adversarially OOD inputs.
- Empirically demonstrates that robustifying the vision encoder (e.g., via adversarial fine-tuning of ViT in InternVL 3.5 gpt-oss 20B) unlocks significant robustness gains from inference-time compute scaling against gradient-based visual attacks.
- Advises layering train-time adversarial robustification with test-time compute scaling (CoT/reasoning) to obtain synergistic robustness benefits beyond either approach alone.
🛡️ Threat Analysis
The paper directly studies gradient-based adversarial attacks (white-box and transfer) on VLMs and proposes inference-time compute scaling as a defense against these input manipulation attacks; adversarial visual perturbations are the primary attack vector evaluated.