benchmark 2025

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

1 citations · 67 references · arXiv

Published on arXiv

2511.01724

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Adversarial training (AT) methods outperform PR-targeted training in improving both adversarial and probabilistic robustness in most settings at no extra cost, while PR-targeted methods offer lower generalization error and higher clean accuracy.

PRBench

Novel technique introduced

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the probability that predictions remain correct under stochastic perturbations. While PR is widely regarded as a practical complement to AR, dedicated training methods for improving PR are still relatively underexplored, albeit with emerging progress. Among the few PR-targeted training methods, we identify three limitations: i non-comparable evaluation protocols; ii limited comparisons to strong AT baselines despite anecdotal PR gains from AT; and iii no unified framework to compare the generalization of these methods. Thus, we introduce PRBench, the first benchmark dedicated to evaluating improvements in PR achieved by different robustness training methods. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalization error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance across diverse hyperparameter settings, while PR-targeted training methods consistently yield lower GE and higher clean accuracy. A leaderboard comprising 222 trained models across 7 datasets and 10 model architectures is publicly available at https://tmpspace.github.io/PRBenchLeaderboard/.

Key Contributions

PRBench: the first benchmark dedicated to evaluating training methods for improving probabilistic robustness (PR), covering 222 models across 7 datasets and 10 architectures
Empirical comparison of 13 training methods (AT and PR-targeted) using a comprehensive metric suite including clean accuracy, PR, AR, training efficiency, and generalization error
Theoretical analysis of generalization error bounds under a unified Uniform Stability Analysis framework, showing PR-targeted methods generalize better while AT provides PR gains 'for free'

🛡️ Threat Analysis

Input Manipulation Attack

PRBench systematically evaluates training methods (adversarial training and PR-targeted training) designed to defend against adversarial examples and stochastic perturbations at inference time — directly addressing the input manipulation threat. It uses 4 adversarial attacks to measure AR performance across 222 trained models.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxdigitalinference_timeuntargeted

Datasets

CIFAR-10multiple datasets (7 total, unspecified in excerpt)

Applications

image classification

Read PDF arXiv DOI Code

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

Defense That Attacks: How Robust Models Become Better Attackers

Analyzing Physical Adversarial Example Threats to Machine Learning in Election Systems

When Flatness Does (Not) Guarantee Adversarial Robustness

Verifying Local Robustness of Pruned Safety-Critical Networks

Localizing Adversarial Attacks To Produces More Imperceptible Noise

How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness

Adversarial Attacks Leverage Interference Between Features in Superposition