Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing
Published on arXiv
2604.20704
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
RDI pre-screening identifies gradient masking in 92% of flagged cases; multi-norm evaluation reveals 23.5 pp gap between average and worst-case robustness on SOTA models
Auto-ART
Novel technique introduced
Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected gradient masking. We make two contributions. (1) Structured synthesis. We analyze nine peer-reviewed corpus sources (2020--2026) through seven complementary protocols, producing the first end-to-end structured analysis of the field's consensus and unresolved challenges. (2) Auto-ART framework. We introduce Auto-ART, an open-source framework that operationalizes identified gaps: 50+ attacks, 28 defense modules, the Robustness Diagnostic Index (RDI), and gradient-masking detection. It supports multi-norm evaluation (l1/l2/linf/semantic/spatial) and compliance mapping to NIST AI RMF, OWASP LLM Top 10, and the EU AI Act. Empirical validation on RobustBench demonstrates that Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases, and RDI rankings correlate highly with full AutoAttack. Multi-norm evaluation exposes a 23.5 pp gap between average and worst-case robustness on state-of-the-art models. No prior work combines such structured meta-scientific analysis with an executable evaluation framework bridging literature gaps into engineering.
Key Contributions
- Structured meta-scientific analysis of adversarial robustness literature across 9 peer-reviewed sources using 7 complementary protocols
- Auto-ART framework with 50+ attacks, 28 defenses, RDI pre-screening (30x faster), FOSC gradient-masking detection, and multi-norm evaluation
- Compliance mapping to NIST AI RMF, OWASP LLM Top 10, and EU AI Act with CI/CD integration via SARIF 2.1.0 output
🛡️ Threat Analysis
Primary focus on adversarial robustness evaluation — implements 50+ evasion attacks (FGSM, PGD, AutoAttack) across multiple norms, evaluates defense mechanisms against adversarial examples, and detects gradient masking in adversarial training pipelines.