attack 2026

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova 1,2, Mikhail Pautov 1,2

0 citations

α

Published on arXiv

2603.10689

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

CAC outperforms state-of-the-art black-box adversarial attack methods on ImageNet across diverse target models including vision transformers, with a proved bound on the number of iterations needed to find an adversarial example.

Contract And Conquer (CAC)

Novel technique introduced


Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.


Key Contributions

  • CAC algorithm: iterative surrogate-model distillation combined with adversarial search-space contraction for provable black-box adversarial example generation
  • Transferability guarantee proving CAC finds an adversarial example for the black-box target within a fixed number of algorithm iterations
  • Empirical SOTA performance over existing black-box attack methods on ImageNet across multiple architectures including vision transformers

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is a novel method to generate adversarial examples (input perturbations causing misclassification) against black-box models at inference time, with theoretical guarantees on finding adversarial examples within a bounded number of iterations.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
ImageNet
Applications
image classification