ANML: Attribution-Native Machine Learning with Guaranteed Robustness

Frontier AI systems increasingly train on specialized expert data, from clinical records to proprietary research to curated datasets, yet current training pipelines treat all samples identically. A Nobel laureate's contribution receives the same weight as an unverified submission. We introduce ANML (Attribution-Native Machine Learning), a framework that weights training samples by four quality factors: gradient-based consistency (q), verification status (v), contributor reputation (r), and temporal relevance (T). By combining what the model observes (gradient signals) with what the system knows about data provenance (external signals), ANML produces per-contributor quality weights that simultaneously improve model performance and enable downstream attribution. Across 5 datasets (178-32,561 samples), ANML achieves 33-72% error reduction over gradient-only baselines. Quality-weighted training is data-efficient: 20% high-quality data outperforms 100% uniformly weighted data by 47%. A Two-Stage Adaptive gating mechanism guarantees that ANML never underperforms the best available baseline, including under strategic joint attacks combining credential faking with gradient alignment. When per-sample detection fails against subtle corruption, contributor-level attribution provides 1.3-5.3x greater improvement than sample-level methods, with the advantage growing as corruption becomes harder to detect.

Key Contributions

Multi-factor sample weighting framework integrating gradient-based quality (q), verification status (v), contributor reputation (r), and temporal relevance (T) into training
Two-Stage Adaptive Gating mechanism that provably never underperforms the best available baseline, including under joint credential-faking + gradient-alignment poisoning attacks
Empirical demonstration that contributor-level attribution provides 1.3–5.3× greater benefit than sample-level detection when per-sample corruption is subtle

🛡️ Threat Analysis

Data Poisoning Attack

ANML directly defends against training-time data poisoning by weighting samples using gradient consistency, verification status, and contributor reputation. The paper explicitly models 'strategic joint attacks combining credential faking with gradient alignment' and evaluates against them, positioning the framework relative to Byzantine-robust aggregation methods (Krum, Trimmed Mean, Bulyan). The Two-Stage Adaptive gating guarantees non-degradation under such attacks.

Details

Domains

tabularfederated-learning

Model Types

traditional_mlfederated

Threat Tags

training_timegrey_boxtargeted

Datasets

UCI datasets (Wine, Iris, Adult, MNIST subset, and one additional; 178–32,561 samples)

Applications

2026 0 cit.

Data Poisoning Attack

69%