defense 2026

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs

Dinesh Srivasthav P ^1,2, Ashok Urlana ^1,2, Rahul Mishra ², Bala Mallikarjunarao Garlapati ¹, Ponnurangam Kumaraguru ²

¹ TCS Research

² IIIT Hyderabad

0 citations · 77 references · arXiv

Published on arXiv

2601.04275

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

NSPU outperforms standard unlearning baselines on knowledge retention and forgetting efficacy while being at least 10× more computationally efficient, with privacy validated via membership inference attack evaluation.

NSPU (Neuro-Semantic Projector Unlearning)

Novel technique introduced

Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to be Forgotten'. However, many existing methods require access to the data being removed, exposing it to membership inference attacks and potential misuse of Personally Identifiable Information (PII). We address this critical challenge by proposing Shadow Unlearning, a novel paradigm of approximate unlearning, that performs machine unlearning on anonymized forget data without exposing PII. We further propose a novel privacy-preserving framework, Neuro-Semantic Projector Unlearning (NSPU) to achieve Shadow unlearning. To evaluate our method, we compile Multi-domain Fictitious Unlearning (MuFU) forget set across five diverse domains and introduce an evaluation stack to quantify the trade-off between knowledge retention and unlearning effectiveness. Experimental results on various LLMs show that NSPU achieves superior unlearning performance, preserves model utility, and enhances user privacy. Additionally, the proposed approach is at least 10 times more computationally efficient than standard unlearning approaches. Our findings foster a new direction for privacy-aware machine unlearning that balances data protection and model fidelity.

Key Contributions

Shadow Unlearning paradigm: performs machine unlearning on anonymized forget sets, eliminating the need to expose raw PII during the unlearning process
NSPU framework: learns a neuro-semantic projector mapping anonymized activations back to original-data activation space, enabling construction of an unlearning filter without direct PII access
MuFU dataset and evaluation stack covering five domains to benchmark trade-offs between unlearning effectiveness and knowledge retention, with MIA-based privacy validation

🛡️ Threat Analysis

Membership Inference Attack

The core threat model is that existing unlearning methods expose forget-set PII to membership inference attacks by requiring raw data access; NSPU defends against this by using anonymized forget sets and is explicitly validated against MIA to confirm the forgotten data cannot be re-identified.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeblack_box

Datasets

MuFU (Multi-domain Fictitious Unlearning, authors' own)

Applications

llm privacy compliancegdpr right-to-be-forgottenprivacy-preserving unlearning

Read PDF arXiv DOI Code

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD

LoRA and Privacy: When Random Projections Help (and When They Don't)

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

Detecting Data Contamination in LLMs via In-Context Learning

SynBench: A Benchmark for Differentially Private Text Generation

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Combating Data Laundering in LLM Training

Protecting Private Code in IDE Autocomplete using Differential Privacy