Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
Published on arXiv
2508.19183
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Tower robustness achieves comprehensive coverage of the perturbation vicinity with maintained precision, outperforming existing probabilistic methods that risk overestimating robustness by missing critical adversarial instances.
Tower Robustness
Novel technique introduced
In safety-critical deep learning applications, robustness measures the ability of neural models that handle imperceptible perturbations in input data, which may lead to potential safety hazards. Existing pre-deployment robustness assessment methods typically suffer from significant trade-offs between computational cost and measurement precision, limiting their practical utility. To address these limitations, this paper conducts a comprehensive comparative analysis of existing robustness definitions and associated assessment methodologies. We propose tower robustness to evaluate robustness, which is a novel, practical metric based on hypothesis testing to quantitatively evaluate probabilistic robustness, enabling more rigorous and efficient pre-deployment assessments. Our extensive comparative evaluation illustrates the advantages and applicability of our proposed approach, thereby advancing the systematic understanding and enhancement of model robustness in safety-critical deep learning applications.
Key Contributions
- Proposes 'tower robustness,' a novel probabilistic robustness metric grounded in hypothesis testing that provides statistical guarantees on failure probability estimates
- Conducts comprehensive comparative analysis of existing robustness definitions and pre-deployment assessment methodologies, highlighting precision–cost trade-offs
- Demonstrates that the proposed method yields more accurate and reliable robustness estimates than state-of-the-art baselines on large-scale DNNs
🛡️ Threat Analysis
The paper's entire contribution centers on evaluating and quantifying model robustness against adversarial perturbations (imperceptible input manipulations) — tower robustness is a new metric for measuring resistance to adversarial examples at inference time.