defense 2026

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu , Zizheng Guo , Jizhou Luo

0 citations · 36 references · arXiv (Cornell University)

α

Published on arXiv

2602.08679

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

DLD consistently outperforms prior plug-and-play defenses (AAA and RND) on ImageNet even under worst-case adaptive score-based query attacks, while preserving the model's predicted labels.

Dashed Line Defense (DLD)

Novel technique introduced


Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.


Key Contributions

  • Reveals that state-of-the-art plug-and-play runtime defense (AAA) can be bypassed by adaptive score-based query attacks, exposing a critical gap between claimed and actual robustness.
  • Proposes Dashed Line Defense (DLD), a plug-and-play post-processing method using a non-continuous loss mapping to introduce ambiguity and disrupt adversarial optimization without altering predicted labels or requiring model parameter access.
  • Provides theoretical guarantees of DLD's defense strength and demonstrates superior performance over AAA and RND on ImageNet under worst-case adaptive attacks.

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against score-based query attacks (ZOO-based adversarial example generation) at inference time using black-box access to model output scores — the canonical ML01 threat of adversarial input manipulation.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_timedigital
Datasets
ImageNet
Applications
image classification