defense 2025

Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

Xunlei Qian , Yue Xing

0 citations · 62 references · arXiv

α

Published on arXiv

2511.18562

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Calibration-time adversarial perturbation strength monotonically controls test-time coverage, and adversarial training produces tighter conformal prediction sets without sacrificing coverage validity.


Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction set size. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects coverage guarantees under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Extensive experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) target coverage can hold over a range of test-time attacks: with a suitable calibration attack, coverage stays within any chosen tolerance band across a contiguous set of perturbation levels; and (iii) adversarial training at the training stage produces tighter prediction sets that retain high informativeness.


Key Contributions

  • Theoretical proof that conformal prediction coverage varies monotonically with calibration-time attack strength, enabling predictable coverage control under adversarial tests
  • Demonstration that a suitably chosen calibration-time attack maintains target coverage within a user-specified tolerance band across a contiguous range of test-time perturbation levels
  • Theoretical and empirical evidence that adversarial training at the model-training stage tightens prediction sets while preserving high coverage, improving informativeness under adversarial conditions

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary focus is on adversarial perturbations at inference time and how they break conformal prediction coverage guarantees. It proposes calibration-time adversarial perturbations and adversarial training as defenses, directly targeting the input manipulation attack threat at test time.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_time
Datasets
CIFAR-10
Applications
image classificationuncertainty quantificationclinical decision support