Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness
Published on arXiv
2603.23860
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Optimal adversarial robustness consistently occurs when maximum second derivative of activation functions falls within 4 to 10, holding across diverse architectures, datasets, and adversarial training methods
RCT-AF
Novel technique introduced
This work investigates the critical role of activation function curvature -- quantified by the maximum second derivative $\max|σ''|$ -- in adversarial robustness. Using the Recursive Curvature-Tunable Activation Family (RCT-AF), which enables precise control over curvature through parameters $α$ and $β$, we systematically analyze this relationship. Our study reveals a fundamental trade-off: insufficient curvature limits model expressivity, while excessive curvature amplifies the normalized Hessian diagonal norm of the loss, leading to sharper minima that hinder robust generalization. This results in a non-monotonic relationship where optimal adversarial robustness consistently occurs when $\max|σ''|$ falls within 4 to 10, a finding that holds across diverse network architectures, datasets, and adversarial training methods. We provide theoretical insights into how activation curvature affects the diagonal elements of the hessian matrix of the loss, and experimentally demonstrate that the normalized Hessian diagonal norm exhibits a U-shaped dependence on $\max|σ''|$, with its minimum within the optimal robustness range, thereby validating the proposed mechanism.
Key Contributions
- Introduces RCT-AF (Recursive Curvature-Tunable Activation Family) enabling precise control over activation curvature
- Discovers non-monotonic relationship between activation curvature and robustness with optimal range of max|σ''| = 4-10
- Provides theoretical analysis linking activation curvature to Hessian diagonal norm and demonstrates U-shaped dependence
🛡️ Threat Analysis
Paper studies adversarial robustness and adversarial training methods, analyzing how activation function properties affect model resistance to adversarial examples.