Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.

Key Contributions

Reveals that state-of-the-art plug-and-play runtime defense (AAA) can be bypassed by adaptive score-based query attacks, exposing a critical gap between claimed and actual robustness.
Proposes Dashed Line Defense (DLD), a plug-and-play post-processing method using a non-continuous loss mapping to introduce ambiguity and disrupt adversarial optimization without altering predicted labels or requiring model parameter access.
Provides theoretical guarantees of DLD's defense strength and demonstrates superior performance over AAA and RND on ImageNet under worst-case adaptive attacks.

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against score-based query attacks (ZOO-based adversarial example generation) at inference time using black-box access to model output scores — the canonical ML01 threat of adversarial input manipulation.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_timedigital

Datasets

ImageNet

Applications

2026 0 cit.

Input Manipulation Attack

91%