benchmark 2025

No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks

Yehonatan Refael 1, Guy Smorodinsky 2, Ofir Lindenbaum 3, Itay Safran 2

0 citations · 33 references · arXiv

α

Published on arXiv

2509.21296

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Without data prior knowledge, reconstruction attacks have infinitely many valid solutions arbitrarily far from the true training set, and networks satisfying implicit bias conditions more strongly are paradoxically harder to attack.


The memorization of training data by neural networks raises pressing concerns for privacy and security. Recent work has shown that, under certain conditions, portions of the training set can be reconstructed directly from model parameters. Some of these methods exploit implicit bias toward margin maximization, suggesting that properties often regarded as beneficial for generalization may actually compromise privacy. Yet despite striking empirical demonstrations, the reliability of these attacks remains poorly understood and lacks a solid theoretical foundation. In this work, we take a complementary perspective: rather than designing stronger attacks, we analyze the inherent weaknesses and limitations of existing reconstruction methods and identify conditions under which they fail. We rigorously prove that, without incorporating prior knowledge about the data, there exist infinitely many alternative solutions that may lie arbitrarily far from the true training set, rendering reconstruction fundamentally unreliable. Empirically, we further demonstrate that exact duplication of training examples occurs only by chance. Our results refine the theoretical understanding of when training set leakage is possible and offer new insights into mitigating reconstruction attacks. Remarkably, we demonstrate that networks trained more extensively, and therefore satisfying implicit bias conditions more strongly -- are, in fact, less susceptible to reconstruction attacks, reconciling privacy with the need for strong generalization in this setting.


Key Contributions

  • Proves that without prior knowledge, infinitely many alternative solutions exist that are arbitrarily far from the true training set, making reconstruction fundamentally unreliable
  • Provides constructive techniques (point merging/splitting lemmas) to generate alternative global minima of the reconstruction objective indistinguishable from the real training data
  • Demonstrates empirically and theoretically that networks trained more extensively (stronger implicit bias) are actually less susceptible to reconstruction attacks, reconciling generalization and privacy

🛡️ Threat Analysis

Model Inversion Attack

The paper rigorously analyzes training data reconstruction attacks — adversaries attempting to recover private training examples from model parameters — and proves conditions under which these attacks inherently fail due to solution non-uniqueness.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_time
Applications
image classification