attack 2025

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

Daofu Zhang 1, Mehrdad Pournaderi 1, Hanne M. Clifford 2, Yu Xiang 3, Pramod K. Varshney 2

1 citations · 57 references · arXiv

α

Published on arXiv

2510.00463

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Adversarial perturbations can significantly increase the false discovery rate of conformal novelty detectors while maintaining high detection power, exposing fundamental vulnerabilities in error-controlled novelty detection methods.

Surrogate Decision-Based Attack

Novel technique introduced


This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on two powerful learning-based frameworks that come with finite-sample false discovery rate (FDR) control: one is AdaDetect (by Marandon et al., 2024) that is based on the positive-unlabeled classifier, and the other is a one-class classifier-based approach (by Bates et al., 2023). While they provide rigorous statistical guarantees under benign conditions, their behavior under adversarial perturbations remains underexplored. We first formulate an oracle attack setup, under the AdaDetect formulation, that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to the output labels of both frameworks. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluate the vulnerability of both frameworks on synthetic and real-world datasets. Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power, exposing fundamental limitations of current error-controlled novelty detection methods and motivating the development of more robust alternatives.


Key Contributions

  • Oracle attack formulation under AdaDetect that derives an upper bound on worst-case FDR degradation under adversarial perturbations
  • Practical surrogate decision-based attack requiring only query access to output labels of both AdaDetect and Bates et al. frameworks
  • Systematic empirical evaluation using HopSkipJump and Boundary Attack demonstrating that adversarial perturbations significantly inflate FDR while preserving detection power in conformal novelty detection

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary contribution is adversarial input manipulation at inference time — crafting perturbations to test data that fool positive-unlabeled and one-class classifiers in conformal novelty detection frameworks (AdaDetect, Bates et al.), causing misclassification of novel samples and degrading FDR control. Uses established decision-based black-box attack algorithms (HopSkipJump, Boundary Attack) applied to these detection systems.


Details

Domains
visiontabular
Model Types
traditional_ml
Threat Tags
black_boxinference_timetargeted
Datasets
synthetic datasetsreal-world datasets (unspecified in excerpt)
Applications
novelty detectionanomaly detectionfraud detection