TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks

Quantized Neural Networks (QNNs) are increasingly deployed in edge and resource-constrained environments due to their efficiency in computation and memory usage. While shown to distort the gradient landscape and weaken conventional pixel-level attacks, it provides limited robustness against patch-based adversarial attacks-localized, high-saliency perturbations that remain surprisingly transferable across bit-widths. Existing defenses either overfit to fixed quantization settings or fail to address this cross-bit generalization vulnerability. We introduce \textbf{TriQDef}, a tri-level quantization-aware defense framework designed to disrupt the transferability of patch-based adversarial attacks across QNNs. TriQDef consists of: (1) a Feature Disalignment Penalty (FDP) that enforces semantic inconsistency by penalizing perceptual similarity in intermediate representations; (2) a Gradient Perceptual Dissonance Penalty (GPDP) that explicitly misaligns input gradients across bit-widths by minimizing structural and directional agreement via Edge IoU and HOG Cosine metrics; and (3) a Joint Quantization-Aware Training Protocol that unifies these penalties within a shared-weight training scheme across multiple quantization levels. Extensive experiments on CIFAR-10 and ImageNet demonstrate that TriQDef reduces Attack Success Rates (ASR) by over 40\% on unseen patch and quantization combinations, while preserving high clean accuracy. Our findings underscore the importance of disrupting both semantic and perceptual gradient alignment to mitigate patch transferability in QNNs.

Key Contributions

Feature Disalignment Penalty (FDP) that enforces semantic inconsistency between clean and perturbed intermediate representations to block patch transferability
Gradient Perceptual Dissonance Penalty (GPDP) that minimizes structural and directional gradient agreement across bit-widths using Edge IoU and HOG Cosine metrics
Joint Quantization-Aware Training Protocol that unifies both penalties in a shared-weight scheme across multiple quantization levels to block cross-bit-width generalization

🛡️ Threat Analysis

Input Manipulation Attack

The paper directly defends against adversarial patch attacks — localized high-saliency perturbations that cause misclassification at inference time. TriQDef is a defense that reduces Attack Success Rate of patch-based adversarial examples by over 40% across unseen quantization levels.

Details

Domains

vision

Model Types

cnn

Threat Tags

black_boxinference_timedigitalphysical

Datasets

CIFAR-10ImageNet

Applications

2025 1 cit.

Input Manipulation Attack

83%