ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning

Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most discriminant patterns in the data. This process gives them their strength, yet it also opens the door to a hidden flaw. The very gradients that help a model learn can also be used to produce small, imperceptible tweaks that cause the model to completely alter its decision. Such tweaks are called adversarial attacks. These attacks exploit this vulnerability by adding tiny, imperceptible changes to images that, while leaving them identical to the human eye, cause the model to make wrong predictions. In this work, we propose Adversarially-trained Contrastive Hard-mining for Optimized Robustness (ANCHOR), a framework that leverages the power of supervised contrastive learning with explicit hard positive mining to enable the model to learn representations for images such that the embeddings for the images, their augmentations, and their perturbed versions cluster together in the embedding space along with those for other images of the same class while being separated from images of other classes. This alignment helps the model focus on stable, meaningful patterns rather than fragile gradient cues. On CIFAR-10, our approach achieves impressive results for both clean and robust accuracy under PGD-20 (epsilon = 0.031), outperforming standard adversarial training methods. Our results indicate that combining adversarial guidance with hard-mined contrastive supervision helps models learn more structured and robust representations, narrowing the gap between accuracy and robustness.

Key Contributions

ANCHOR framework that pairs supervised contrastive loss with adaptive hard positive mining, dynamically weighting intra-class samples most dissimilar to their class counterparts during adversarial training
Clustering adversarial perturbations, clean images, and augmentations together in embedding space to learn stable, non-fragile representations
Demonstrated improved clean and robust accuracy on CIFAR-10 under PGD-20 (ε=0.031) over standard adversarial training baselines

🛡️ Threat Analysis

Input Manipulation Attack

The paper proposes ANCHOR, a defense against adversarial input perturbations (FGSM, PGD, C&W attacks) that cause misclassification at inference time. The defense uses adversarial training augmented with hard-mined supervised contrastive learning to build representations that cluster adversarial examples with their clean counterparts, directly targeting adversarial robustness.