The Measure of Deception: An Analysis of Data Forging in Machine Unlearning
Rishabh Dixit , Yuan Hui , Rayan Saab
Published on arXiv
2509.05865
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
The Lebesgue measure of gradient-forging sets scales as ε^((d-r)/2), proving that adversarial forging of unlearning trajectories is statistically anomalous and in principle detectable under non-degenerate data distributions.
ε-forging set analysis
Novel technique introduced
Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is forging -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance $ε$ -- which we call an $ε$-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of $ε$, and when $ε$ is small enough, $ε^d$. More generally, under mild regularity assumptions, we prove that the forging set measure decays as $ε^{(d-r)/2}$, where $d$ is the data dimension and $r<d$ is the nullity of a variation matrix defined by the model gradients. Extensions to batch SGD and almost-everywhere smooth loss functions yield the same asymptotic scaling. In addition, we establish probability bounds showing that, under non-degenerate data distributions, the likelihood of randomly sampling a forging point is vanishingly small. These results provide evidence that adversarial forging is fundamentally limited and that false unlearning claims can, in principle, be detected.
Key Contributions
- Introduces the ε-forging set — the collection of data points whose gradients approximate a target gradient within tolerance ε — and proves its Lebesgue measure scales as ε^((d-r)/2), where d is data dimension and r is the nullity of a variation matrix
- Shows that under non-degenerate data distributions the probability of randomly sampling a forging point is vanishingly small, making gradient-forging attacks statistically brittle
- Extends the analysis to batch SGD and almost-everywhere smooth loss functions, obtaining the same asymptotic scaling and implying that false unlearning claims can in principle be detected via distributional anomaly tests
🛡️ Threat Analysis
The paper's adversary (model trainer) attempts to defeat unlearning verification — audits that use gradient-based or MIA-style checks to confirm data erasure. By proving the forging set has vanishingly small measure, the paper establishes that false unlearning claims are detectable, directly informing the reliability of membership-inference-based unlearning audits.