Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence
Published on arXiv
2512.15780
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
FGSM/PGD attacks reduce AUC by ~10.6% and increase expected portfolio loss by ~5%; adversarial training recovers most lost utility with minor calibration trade-offs.
Semantic Robustness Index (SRI)
Novel technique introduced
We evaluate adversarial robustness in tabular machine learning models used in financial decision making. Using credit scoring and fraud detection data, we apply gradient based attacks and measure impacts on discrimination, calibration, and financial risk metrics. Results show notable performance degradation under small perturbations and partial recovery through adversarial training.
Key Contributions
- Dataset-agnostic adversarial robustness evaluation pipeline for tabular financial ML that extends beyond classification metrics to financial risk measures (Expected Loss, VaR, Expected Shortfall)
- Empirical evidence that small plausibility-bounded perturbations (ε=0.05) reduce AUC by ~10.6% and inflate expected portfolio loss by ~5%, with adversarial training recovering substantial utility
- Semantic Robustness Index (SRI) using SHAP attribution stability as an early-warning indicator for adversarial influence, detecting degradation before AUC decline is observed
🛡️ Threat Analysis
Core contribution is applying gradient-based adversarial attacks (FGSM, PGD) to tabular ML models at inference time and evaluating adversarial training as a defense — canonical input manipulation attack and defense evaluation.