GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees

Adversarial robustness is essential for deploying neural networks in safety-critical applications, yet standard evaluation methods either require expensive adversarial attacks or report only a single aggregate score that obscures how robustness is distributed across classes. We introduce the \emph{GF-Score} (GREAT-Fairness Score), a framework that decomposes the certified GREAT Score into per-class robustness profiles and quantifies their disparity through four metrics grounded in welfare economics: the Robustness Disparity Index (RDI), the Normalized Robustness Gini Coefficient (NRGC), Worst-Case Class Robustness (WCR), and a Fairness-Penalized GREAT Score (FP-GREAT). The framework further eliminates the original method's dependence on adversarial attacks through a self-calibration procedure that tunes the temperature parameter using only clean accuracy correlations. Evaluating 22 models from RobustBench across CIFAR-10 and ImageNet, we find that the decomposition is exact, that per-class scores reveal consistent vulnerability patterns (e.g., ``cat'' is the weakest class in 76\% of CIFAR-10 models), and that more robust models tend to exhibit greater class-level disparity. These results establish a practical, attack-free auditing pipeline for diagnosing where certified robustness guarantees fail to protect all classes equally. We release our code on \href{https://github.com/aryashah2k/gf-score}{GitHub}.

Key Contributions

GF-Score framework decomposing aggregate certified robustness into per-class profiles with four fairness metrics (RDI, NRGC, WCR, FP-GREAT)
Self-calibration procedure eliminating dependence on adversarial attacks by tuning temperature using clean accuracy correlations
Empirical finding that higher overall robustness correlates with greater class-level disparity across 22 RobustBench models

🛡️ Threat Analysis

Input Manipulation Attack

Framework evaluates certified robustness against adversarial examples (input manipulation attacks) and provides an attack-free alternative to empirical adversarial testing. The GREAT Score and its decomposition quantify certified guarantees against bounded perturbations at inference time.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

inference_timedigital

Datasets

CIFAR-10ImageNet

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Solving adversarial examples requires solving exponential misalignment

Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness

How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness

When Flatness Does (Not) Guarantee Adversarial Robustness

Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability

Adversarial Attacks Leverage Interference Between Features in Superposition

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Probabilistic Robustness for Free? Revisiting Training via a Benchmark