Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence

We evaluate adversarial robustness in tabular machine learning models used in financial decision making. Using credit scoring and fraud detection data, we apply gradient based attacks and measure impacts on discrimination, calibration, and financial risk metrics. Results show notable performance degradation under small perturbations and partial recovery through adversarial training.

Key Contributions

Dataset-agnostic adversarial robustness evaluation pipeline for tabular financial ML that extends beyond classification metrics to financial risk measures (Expected Loss, VaR, Expected Shortfall)
Empirical evidence that small plausibility-bounded perturbations (ε=0.05) reduce AUC by ~10.6% and inflate expected portfolio loss by ~5%, with adversarial training recovering substantial utility
Semantic Robustness Index (SRI) using SHAP attribution stability as an early-warning indicator for adversarial influence, detecting degradation before AUC decline is observed

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is applying gradient-based adversarial attacks (FGSM, PGD) to tabular ML models at inference time and evaluating adversarial training as a defense — canonical input manipulation attack and defense evaluation.