defense 2026

ANML: Attribution-Native Machine Learning with Guaranteed Robustness

Oliver Zahn 1, Matt Beton 2,3, Simran Chana 2,3

0 citations · 21 references · arXiv (Cornell University)

α

Published on arXiv

2602.11690

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Quality-weighted training with 20% high-quality data outperforms uniform weighting of 100% data by 47%, and the framework maintains robustness under strategic joint attacks where gradient-only methods fail entirely.

ANML (Attribution-Native Machine Learning)

Novel technique introduced


Frontier AI systems increasingly train on specialized expert data, from clinical records to proprietary research to curated datasets, yet current training pipelines treat all samples identically. A Nobel laureate's contribution receives the same weight as an unverified submission. We introduce ANML (Attribution-Native Machine Learning), a framework that weights training samples by four quality factors: gradient-based consistency (q), verification status (v), contributor reputation (r), and temporal relevance (T). By combining what the model observes (gradient signals) with what the system knows about data provenance (external signals), ANML produces per-contributor quality weights that simultaneously improve model performance and enable downstream attribution. Across 5 datasets (178-32,561 samples), ANML achieves 33-72% error reduction over gradient-only baselines. Quality-weighted training is data-efficient: 20% high-quality data outperforms 100% uniformly weighted data by 47%. A Two-Stage Adaptive gating mechanism guarantees that ANML never underperforms the best available baseline, including under strategic joint attacks combining credential faking with gradient alignment. When per-sample detection fails against subtle corruption, contributor-level attribution provides 1.3-5.3x greater improvement than sample-level methods, with the advantage growing as corruption becomes harder to detect.


Key Contributions

  • Multi-factor sample weighting framework integrating gradient-based quality (q), verification status (v), contributor reputation (r), and temporal relevance (T) into training
  • Two-Stage Adaptive Gating mechanism that provably never underperforms the best available baseline, including under joint credential-faking + gradient-alignment poisoning attacks
  • Empirical demonstration that contributor-level attribution provides 1.3–5.3× greater benefit than sample-level detection when per-sample corruption is subtle

🛡️ Threat Analysis

Data Poisoning Attack

ANML directly defends against training-time data poisoning by weighting samples using gradient consistency, verification status, and contributor reputation. The paper explicitly models 'strategic joint attacks combining credential faking with gradient alignment' and evaluates against them, positioning the framework relative to Byzantine-robust aggregation methods (Krum, Trimmed Mean, Bulyan). The Two-Stage Adaptive gating guarantees non-degradation under such attacks.


Details

Domains
tabularfederated-learning
Model Types
traditional_mlfederated
Threat Tags
training_timegrey_boxtargeted
Datasets
UCI datasets (Wine, Iris, Adult, MNIST subset, and one additional; 178–32,561 samples)
Applications
training data quality weightingdata provenance attributionfederated learning with heterogeneous contributors