Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks
Yanzhang Fu , Zizheng Guo , Jizhou Luo
Published on arXiv
2602.08679
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
DLD consistently outperforms prior plug-and-play defenses (AAA and RND) on ImageNet even under worst-case adaptive score-based query attacks, while preserving the model's predicted labels.
Dashed Line Defense (DLD)
Novel technique introduced
Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.
Key Contributions
- Reveals that state-of-the-art plug-and-play runtime defense (AAA) can be bypassed by adaptive score-based query attacks, exposing a critical gap between claimed and actual robustness.
- Proposes Dashed Line Defense (DLD), a plug-and-play post-processing method using a non-continuous loss mapping to introduce ambiguity and disrupt adversarial optimization without altering predicted labels or requiring model parameter access.
- Provides theoretical guarantees of DLD's defense strength and demonstrates superior performance over AAA and RND on ImageNet under worst-case adaptive attacks.
🛡️ Threat Analysis
Directly defends against score-based query attacks (ZOO-based adversarial example generation) at inference time using black-box access to model output scores — the canonical ML01 threat of adversarial input manipulation.