defense 2026

The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples

Hsiang Hsu , Pradeep Niroula , Zichang He , Ivan Brugere , Freddy Lecue , Chun-Fu Chen

1 citations · 79 references · arXiv

α

Published on arXiv

2601.22359

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Over 7% of forget samples on CIFAR-10 exhibit residual knowledge under perturbation norm ~0.03 across existing unlearning methods; RURK effectively suppresses this vulnerability while maintaining accuracy on retained data.

RURK

Novel technique introduced


Machine unlearning offers a practical alternative to avoid full model re-training by approximately removing the influence of specific user data. While existing methods certify unlearning via statistical indistinguishability from re-trained models, these guarantees do not naturally extend to model outputs when inputs are adversarially perturbed. In particular, slight perturbations of forget samples may still be correctly recognized by the unlearned model - even when a re-trained model fails to do so - revealing a novel privacy risk: information about the forget samples may persist in their local neighborhood. In this work, we formalize this vulnerability as residual knowledge and show that it is inevitable in high-dimensional settings. To mitigate this risk, we propose a fine-tuning strategy, named RURK, that penalizes the model's ability to re-recognize perturbed forget samples. Experiments on vision benchmarks with deep neural networks demonstrate that residual knowledge is prevalent across existing unlearning methods and that our approach effectively prevents residual knowledge.


Key Contributions

  • Formalizes 'residual knowledge' — the vulnerability where unlearned models still correctly classify adversarially perturbed forget samples, even when re-trained models fail to do so
  • Proves via geometric probability that this disagreement is inevitable in high-dimensional input spaces, even under certified approximate unlearning
  • Proposes RURK, a fine-tuning strategy that penalizes the model's ability to recognize perturbed forget samples, demonstrating effectiveness across multiple unlearning algorithms on vision benchmarks

🛡️ Threat Analysis

Membership Inference Attack

The core vulnerability is a novel form of membership inference: an adversary uses adversarially perturbed versions of forget samples to demonstrate that the unlearned model still retains knowledge of those samples (correctly predicting them when a properly re-trained model cannot). This constitutes a privacy attack that reveals whether specific data points were in the training set — the defining threat of ML04. RURK is a defense that directly prevents this form of membership inference under perturbation.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_time
Datasets
CIFAR-10
Applications
image classificationmachine unlearning