Data Reconstruction: Identifiability and Optimization with Sample Splitting

Training data reconstruction from KKT conditions has shown striking empirical success, yet it remains unclear when the resulting KKT equations have unique solutions and, even in identifiable regimes, how to reliably recover solutions by optimization. This work hereby focuses on these two complementary questions: identifiability and optimization. On the identifiability side, we discuss the sufficient conditions for KKT system of two-layer networks with polynomial activations to uniquely determine the training data, providing a theoretical explanation of when and why reconstruction is possible. On the optimization side, we introduce sample splitting, a curvature-aware refinement step applicable to general reconstruction objectives (not limited to KKT-based formulations): it creates additional descent directions to escape poor stationary points and refine solutions. Experiments demonstrate that augmenting several existing reconstruction methods with sample splitting consistently improves reconstruction performance.

Key Contributions

Identifiability analysis proving that two-layer networks with polynomial activations of degree ≥3 allow exact training sample recovery from KKT equations under moderate width conditions
Sample splitting: a curvature-aware optimization refinement step that creates additional descent directions to escape poor stationary points in the nonconvex reconstruction objective
Empirical validation showing sample splitting consistently improves reconstruction quality when augmenting several existing KKT-based reconstruction methods

🛡️ Threat Analysis

Model Inversion Attack

The paper directly advances training data reconstruction attacks: it analyzes when private training samples are uniquely recoverable from model parameters (identifiability), and introduces 'sample splitting' to make reconstruction optimization more effective — both contributions serve an adversary reconstructing training data from model weights.