attack 2026

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

0 citations · 60 references · arXiv

Published on arXiv

2601.14300

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

DPAttack consistently surpasses SOTA hard-label black-box attacks in attack success rate and query efficiency across multiple benchmarks, achieving 0% detection rate against the Blacklight stateful defense.

Pattern-Driven Optimization (PDO) / DPAttack

Novel technique introduced

Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback model for understanding model behavior. A central challenge in this regime is whether meaningful gradient information can be recovered from such discrete responses. In this work, we develop a unified theoretical perspective showing that a wide range of existing sign-flipping hard-label attacks can be interpreted as implicitly approximating the sign of the true loss gradient. This observation reframes hard-label attacks from heuristic search procedures into instances of gradient sign recovery under extremely limited feedback. Motivated by this first-principles understanding, we propose a new attack framework that combines a zero-query frequency-domain initialization with a Pattern-Driven Optimization (PDO) strategy. We establish theoretical guarantees demonstrating that, under mild assumptions, our initialization achieves higher expected cosine similarity to the true gradient sign compared to random baselines, while the proposed PDO procedure attains substantially lower query complexity than existing structured search approaches. We empirically validate our framework through extensive experiments on CIFAR-10, ImageNet, and ObjectNet, covering standard and adversarially trained models, commercial APIs, and CLIP-based models. The results show that our method consistently surpasses SOTA hard-label attacks in both attack success rate and query efficiency, particularly in low-query regimes. Beyond image classification, our approach generalizes effectively to corrupted data, biomedical datasets, and dense prediction tasks. Notably, it also successfully circumvents Blacklight, a SOTA stateful defense, resulting in a $0\%$ detection rate. Our code will be released publicly soon at https://github.com/csjunjun/DPAttack.git.

Key Contributions

Unified theoretical framework showing existing hard-label sign-flipping attacks implicitly approximate the true loss gradient sign, reframing them as gradient sign recovery problems.
Pattern-Driven Optimization (PDO) attack framework with zero-query frequency-domain initialization that achieves higher cosine similarity to the true gradient sign and lower query complexity than prior structured search approaches.
Empirical validation surpassing SOTA hard-label attacks across CIFAR-10, ImageNet, ObjectNet, and biomedical datasets, including 0% detection rate against the Blacklight stateful defense.

🛡️ Threat Analysis

Input Manipulation Attack

Proposes a novel adversarial example attack (PDO/DPAttack) operating under hard-label black-box constraints, crafting inputs that cause misclassification at inference time by recovering gradient sign information from discrete label feedback — a canonical input manipulation attack.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_timedigital

Datasets

CIFAR-10ImageNetObjectNetImageNet-CPathMNIST

Applications

image classificationsemantic segmentationbiomedical image classification

Read PDF arXiv DOI Code

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Beyond Deceptive Flatness: Dual-Order Solution for Strengthening Adversarial Transferability

SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment Models

Towards Powerful and Practical Patch Attacks for 2D Object Detection in Autonomous Driving