defense 2025

Delete and Retain: Efficient Unlearning for Document Classification

Aadya Goel 1, Mayuri Sridhar 2,3

0 citations · 29 references · arXiv

α

Published on arXiv

2512.13711

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Hessian Reassignment achieves retained-class accuracy close to full retraining while consistently lowering membership-inference advantage on the removed class, measured via pooled multi-shadow attacks.

Hessian Reassignment

Novel technique introduced


Machine unlearning aims to efficiently remove the influence of specific training data from a model without full retraining. While much progress has been made in unlearning for LLMs, document classification models remain relatively understudied. In this paper, we study class-level unlearning for document classifiers and present Hessian Reassignment, a two-step, model-agnostic solution. First, we perform a single influence-style update that subtracts the contribution of all training points from the target class by solving a Hessian-vector system with conjugate gradients, requiring only gradient and Hessian-vector products. Second, in contrast to common unlearning baselines that randomly reclassify deleted-class samples, we enforce a decision-space guarantee via Top-1 classification. On standard text benchmarks, Hessian Reassignment achieves retained-class accuracy close to full retrain-without-class while running orders of magnitude faster. Additionally, it consistently lowers membership-inference advantage on the removed class, measured with pooled multi-shadow attacks. These results demonstrate a practical, principled path to efficient class unlearning in document classification.


Key Contributions

  • Formalizes class-level unlearning for document classifiers, including the decision-space ambiguity introduced by removing an entire label
  • Introduces Hessian Reassignment: a two-step procedure combining a class-level Hessian downweight update (via conjugate gradient) with deterministic top-1 reclassification over non-target labels
  • Demonstrates that Hessian Reassignment matches retrain-without-class accuracy while consistently lowering membership-inference advantage over random relabeling baselines

🛡️ Threat Analysis

Membership Inference Attack

The paper explicitly evaluates Hessian Reassignment as a defense against membership inference attacks on the removed class, measuring reduction in MIA advantage using pooled multi-shadow attacks — satisfying the adversarial threat model for ML04.


Details

Domains
nlp
Model Types
traditional_ml
Threat Tags
training_timeinference_timeblack_box
Datasets
standard text benchmarks
Applications
document classificationtext classification