Adversarial Vulnerability Transcends Computational Paradigms: Feature Engineering Provides No Defense Against Neural Adversarial Transfer
Achraf Hsain , Ahmed Abdelkader , Emmanuel Baldwin Mbaya , Hamoud Aljamaan
Published on arXiv
2601.21323
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
All HOG-based classifiers suffer 16.6–59.1% relative accuracy drops under neural adversarial transfer, with FGSM outperforming iterative PGD in 100% of classical ML cases due to PGD overfitting to surrogate-specific features that don't survive HOG extraction.
Deep neural networks are vulnerable to adversarial examples--inputs with imperceptible perturbations causing misclassification. While adversarial transfer within neural networks is well-documented, whether classical ML pipelines using handcrafted features inherit this vulnerability when attacked via neural surrogates remains unexplored. Feature engineering creates information bottlenecks through gradient quantization and spatial binning, potentially filtering high-frequency adversarial signals. We evaluate this hypothesis through the first comprehensive study of adversarial transfer from DNNs to HOG-based classifiers. Using VGG16 as a surrogate, we generate FGSM and PGD adversarial examples and test transfer to four classical classifiers (KNN, Decision Tree, Linear SVM, Kernel SVM) and a shallow neural network across eight HOG configurations on CIFAR-10. Our results strongly refute the protective hypothesis: all classifiers suffer 16.6%-59.1% relative accuracy drops, comparable to neural-to-neural transfer. More surprisingly, we discover attack hierarchy reversal--contrary to patterns where iterative PGD dominates FGSM within neural networks, FGSM causes greater degradation than PGD in 100% of classical ML cases, suggesting iterative attacks overfit to surrogate-specific features that don't survive feature extraction. Block normalization provides partial but insufficient mitigation. These findings demonstrate that adversarial vulnerability is not an artifact of end-to-end differentiability but a fundamental property of image classification systems, with implications for security-critical deployments across computational paradigms.
Key Contributions
- First systematic evaluation of L∞-bounded adversarial transfer from CNNs (VGG16) to HOG-based classifiers across four classical ML models and eight HOG configurations, showing 16.6–59.1% relative accuracy drops
- Discovery of 'attack hierarchy reversal': FGSM causes greater degradation than PGD in 100% of classical ML transfer cases, the inverse of patterns observed in neural-to-neural transfer
- Systematic HOG parameter sensitivity analysis (cell size, orientation bins, block normalization) showing block normalization offers partial but insufficient mitigation against transferred perturbations
🛡️ Threat Analysis
Paper studies transferability of gradient-based adversarial examples (FGSM, PGD generated on VGG16) to HOG-based classical ML classifiers at inference time — this is squarely an input manipulation / adversarial transfer study, with the novel finding that feature engineering provides no defense.