benchmark 2025

The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning

Ethan Hsu 1, Harry Chen 2, Chudi Zhong 3, Lesia Semenova 4

0 citations · 82 references · arXiv

α

Published on arXiv

2511.21799

Input Manipulation Attack

OWASP ML Top 10 — ML01

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Rashomon set diversity creates a measurable robustness-privacy trade-off: including more diverse near-optimal models improves adversarial accuracy but monotonically decreases dataset reconstruction error (increases leakage).

Reactive Robustness

Novel technique introduced


Real-world machine learning (ML) pipelines rarely produce a single model; instead, they produce a Rashomon set of many near-optimal ones. We show that this multiplicity reshapes key aspects of trustworthiness. At the individual-model level, sparse interpretable models tend to preserve privacy but are fragile to adversarial attacks. In contrast, the diversity within a large Rashomon set enables reactive robustness: even when an attack breaks one model, others often remain accurate. Rashomon sets are also stable under small distribution shifts. However, this same diversity increases information leakage, as disclosing more near-optimal models provides an attacker with progressively richer views of the training data. Through theoretical analysis and empirical studies of sparse decision trees and linear models, we characterize this robustness-privacy trade-off and highlight the dual role of Rashomon sets as both a resource and a risk for trustworthy ML.


Key Contributions

  • Proves that sparse interpretable models are inherently fragile to adversarial attacks but naturally limit training data leakage, establishing a fundamental single-model trade-off
  • Introduces 'reactive robustness' — leveraging Rashomon set diversity to find near-optimal replacement models that withstand attacks targeted at a specific deployed model
  • Characterizes the robustness-privacy trade-off empirically and theoretically: larger, more diverse Rashomon sets improve reactive robustness but monotonically increase training data reconstruction risk

🛡️ Threat Analysis

Input Manipulation Attack

Central finding is that sparse interpretable models are fundamentally fragile to adversarial attacks, and the paper proposes 'reactive robustness' — switching to alternative near-optimal Rashomon set members when one model is compromised by an adversarial attack. Adversarial attack fragility and defenses against it are a primary contribution.

Model Inversion Attack

The paper explicitly analyzes training data reconstruction risk: disclosing more near-optimal models gives an attacker progressively richer views of training data, with empirical measurement of dataset reconstruction error as privacy risk metric. The threat model is an adversary trying to reconstruct training data from disclosed models.


Details

Domains
tabular
Model Types
traditional_ml
Threat Tags
white_boxinference_timetraining_time
Datasets
COMPAS
Applications
criminal justice risk assessmentlending decisionshealthcare prediction