defense 2026

UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding

0 citations · 62 references · arXiv (Cornell University)

Published on arXiv

2602.07358

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

UTOPIA drives unauthorized model training to near-random performance on diverse tabular datasets and architectures, outperforming existing unlearnable example baselines

UTOPIA

Novel technique introduced

Unlearnable examples (UE) have emerged as a practical mechanism to prevent unauthorized model training on private vision data, while extending this protection to tabular data is nontrivial. Tabular data in finance and healthcare is highly sensitive, yet existing UE methods transfer poorly because tabular features mix numerical and categorical constraints and exhibit saliency sparsity, with learning dominated by a few dimensions. Under a Spectral Dominance condition, we show certified unlearnability is feasible when the poison spectrum overwhelms the clean semantic spectrum. Guided by this, we propose Unlearnable Tabular Data via DecOuPled Shortcut EmbeddIng (UTOPIA), which exploits feature redundancy to decouple optimization into two channels: high saliency features for semantic obfuscation and low saliency redundant features for embedding a hyper correlated shortcut, yielding constraint-aware dominant shortcuts while preserving tabular validity. Extensive experiments across tabular datasets and models show UTOPIA drives unauthorized training toward near random performance, outperforming strong UE baselines and transferring well across architectures.

Key Contributions

Spectral Dominance condition providing theoretical certification that unlearnability is achievable when the poison spectral norm overwhelms the clean semantic signal
Decoupled optimization splitting tabular features into two channels: high-saliency features for semantic obfuscation and low-saliency redundant features for embedding a hyper-correlated shortcut
Constraint-aware perturbation generation that preserves tabular structural validity across mixed numerical and categorical feature spaces

🛡️ Threat Analysis

Data Poisoning Attack

UTOPIA is a defensive application of training-data manipulation: the data owner injects perturbations (shortcut embeddings) that corrupt the learning signal so unauthorized models trained on the protected data achieve near-random performance. The core mechanism — poisoning training data to degrade model utility — maps directly to ML02, here deployed defensively by the data owner rather than adversarially by an attacker.

Details

Domains

tabular

Model Types

traditional_mltransformer

Threat Tags

training_time

Applications

tabular data protectionfinancial datahealthcare data

Read PDF arXiv DOI

UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SecureLearn -- An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

Mathematical Foundations of Poisoning Attacks on Linear Regression over Cumulative Distribution Functions

On Robustness of Linear Classifiers to Targeted Data Poisoning

IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion

Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles

ANML: Attribution-Native Machine Learning with Guaranteed Robustness