defense 2025

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Rui Wang 1, Zeming Wei 1, Xiyue Zhang 2, Meng Sun 1

0 citations · 49 references · arXiv

α

Published on arXiv

2511.12265

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

CAS achieves superior overall robustness against unforeseen attack types while maintaining high clean accuracy compared to existing adversarial training frameworks.

Calibrated Adversarial Sampling (CAS)

Novel technique introduced


Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of DNNs. However, existing AT frameworks primarily focus on a single or a limited set of attack types, leaving DNNs still exposed to attack types that may be encountered in practice but not addressed during training. In this paper, we propose an efficient fine-tuning method called Calibrated Adversarial Sampling (CAS) to address these issues. From the optimization perspective within the multi-armed bandit framework, it dynamically designs rewards and balances exploration and exploitation by considering the dynamic and interdependent characteristics of multiple robustness dimensions. Experiments on benchmark datasets show that CAS achieves superior overall robustness while maintaining high clean accuracy, providing a new paradigm for robust generalization of DNNs.


Key Contributions

  • Proposes Calibrated Adversarial Sampling (CAS), a fine-tuning method that frames adversarial attack selection as a multi-armed bandit problem to balance exploration and exploitation across multiple attack types.
  • Dynamically designs rewards reflecting the interdependent robustness dimensions of multiple attack types to guide training toward generalized robustness.
  • Achieves superior robustness generalization to unforeseen attack types while maintaining high clean accuracy on benchmark datasets.

🛡️ Threat Analysis

Input Manipulation Attack

Proposes CAS, an adversarial training defense that enhances DNN robustness against adversarial input perturbations, specifically targeting generalization to unforeseen attack types not seen during training — a direct defense against input manipulation attacks.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_timedigitaluntargeted
Datasets
CIFAR-10CIFAR-100ImageNet
Applications
image classification