Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction set size. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects coverage guarantees under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Extensive experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) target coverage can hold over a range of test-time attacks: with a suitable calibration attack, coverage stays within any chosen tolerance band across a contiguous set of perturbation levels; and (iii) adversarial training at the training stage produces tighter prediction sets that retain high informativeness.

Key Contributions

Theoretical proof that conformal prediction coverage varies monotonically with calibration-time attack strength, enabling predictable coverage control under adversarial tests
Demonstration that a suitably chosen calibration-time attack maintains target coverage within a user-specified tolerance band across a contiguous range of test-time perturbation levels
Theoretical and empirical evidence that adversarial training at the model-training stage tightens prediction sets while preserving high coverage, improving informativeness under adversarial conditions

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary focus is on adversarial perturbations at inference time and how they break conformal prediction coverage guarantees. It proposes calibration-time adversarial perturbations and adversarial training as defenses, directly targeting the input manipulation attack threat at test time.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_time

Datasets

CIFAR-10

Applications

2025 0 cit.

Input Manipulation Attack

91%