defense 2026

FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning

Qian Feng 1, JiaHang Tu 1, Mintong Kang 2, Hanbin Zhao 1, Chao Zhang 1, Hui Qian 1

3 citations · 1 influential · 85 references · arXiv

α

Published on arXiv

2601.13578

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

FG-OrIU achieves 'deep forgetting' where DIP-based reconstruction from forgotten class features produces only noise, whereas existing unlearning methods leave residual recoverable information yielding blurred but recognizable images.

FG-OrIU

Novel technique introduced


Incremental unlearning (IU) is critical for pre-trained models to comply with sequential data deletion requests, yet existing methods primarily suppress parameters or confuse knowledge without explicit constraints on both feature and gradient level, resulting in \textit{superficial forgetting} where residual information remains recoverable. This incomplete forgetting risks security breaches and disrupts retention balance, especially in IU scenarios. We propose FG-OrIU (\textbf{F}eature-\textbf{G}radient \textbf{Or}thogonality for \textbf{I}ncremental \textbf{U}nlearning), the first framework unifying orthogonal constraints on both features and gradients level to achieve deep forgetting, where the forgetting effect is irreversible. FG-OrIU decomposes feature spaces via Singular Value Decomposition (SVD), separating forgetting and remaining class features into distinct subspaces. It then enforces dual constraints: feature orthogonal projection on both forgetting and remaining classes, while gradient orthogonal projection prevents the reintroduction of forgotten knowledge and disruption to remaining classes during updates. Additionally, dynamic subspace adaptation merges newly forgetting subspaces and contracts remaining subspaces, ensuring a stable balance between removal and retention across sequential unlearning tasks. Extensive experiments demonstrate the effectiveness of our method.


Key Contributions

  • FG-OrIU framework enforcing dual orthogonal constraints at both feature and gradient levels to achieve irreversible 'deep forgetting' rather than superficial feature degradation
  • SVD-based feature space decomposition that separates forgetting and remaining class subspaces with orthogonal projection constraints preventing knowledge re-entanglement
  • Dynamic subspace adaptation mechanism that merges newly forgotten subspaces and contracts remaining subspaces across sequential unlearning tasks

🛡️ Threat Analysis

Model Inversion Attack

The paper frames 'superficial forgetting' as a security risk where an adversary can reconstruct forgotten training data from residual features (demonstrated via Deep Image Prior reconstruction). FG-OrIU explicitly defends against this by making feature reconstruction produce only noise, directly targeting data reconstruction from model internals. The retraining-the-head experiment further demonstrates that residual information is adversarially exploitable, and the paper proposes unlearning as a defense against this recovery.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timewhite_box
Applications
image classificationincremental class unlearningpre-trained vision model governance