Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

Key Contributions

Theoretical identification of 'negative gradient alignment' in Differential Attention as the structural cause of adversarial sensitivity amplification, formalized through local Lipschitz constant analysis
Empirical validation of the 'Fragile Principle' showing DiffViT and DiffCLIP exhibit higher adversarial attack success rates and gradient norms than standard attention counterparts across five datasets
Discovery of a depth-dependent robustness crossover: stacking DA layers attenuates small perturbations via noise cancellation but this protection collapses under larger attack budgets

🛡️ Threat Analysis

Input Manipulation Attack

Paper characterizes adversarial input perturbation vulnerability of DA-based models (DiffViT, DiffCLIP), measuring higher attack success rates, increased gradient norms, and elevated local Lipschitz constants — directly analyzing the adversarial example threat surface created by the subtractive attention design.