defense 2026

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Xiaoyu Xu ¹, Minxin Du ¹, Kun Fang ¹, Zi Liang ¹, Yaxin Xiao ¹, Zhicong Huang ², Cheng Hong ², Qingqing Ye ¹, Haibo Hu ¹

¹ The Hong Kong Polytechnic University

² Ant Group

0 citations · 54 references · arXiv

Published on arXiv

2601.21682

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

FIT achieves the strongest forgetting-utility trade-off across four LLMs with hundreds of sequential deletion requests while remaining resistant to both relearning and quantization recovery attacks.

FIT

Novel technique introduced

Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing LLM unlearning methods rarely consider the continual and high-volume nature of real-world deletion requests, which can cause utility degradation and catastrophic forgetting as requests accumulate. To address this challenge, we introduce \fit, a framework for continual unlearning that handles large numbers of deletion requests while maintaining robustness against both catastrophic forgetting and post-unlearning recovery. \fit mitigates degradation through rigorous data \underline{F}iltering, \underline{I}mportance-aware updates, and \underline{T}argeted layer attribution, enabling stable performance across long sequences of unlearning operations and achieving a favorable balance between forgetting effectiveness and utility retention. To support realistic evaluation, we present \textbf{PCH}, a benchmark covering \textbf{P}ersonal information, \textbf{C}opyright, and \textbf{H}armful content in sequential deletion scenarios, along with two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), which jointly assess forgetting quality and utility preservation. Extensive experiments on four open-source LLMs with hundreds of deletion requests show that \fit achieves the strongest trade-off between F.D. and R.U., surpasses existing methods on MMLU, CommonsenseQA, and GSM8K, and remains resistant against both relearning and quantization recovery attacks.

Key Contributions

FIT framework for continual LLM unlearning using data Filtering, Importance-aware updates, and Targeted layer attribution to resist catastrophic forgetting across hundreds of sequential deletion requests
PCH benchmark covering Personal information, Copyright, and Harmful content for realistic sequential unlearning evaluation, with symmetric Forget Degree and Retain Utility metrics
Demonstrated resistance against post-unlearning adversarial recovery via both relearning (fine-tuning) and quantization attacks across four open-source LLMs

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

training_timeblack_box

Datasets

MMLUCommonsenseQAGSM8KPCH (authors' benchmark)

Applications

large language modelsprivacy-preserving aicontent moderation

Read PDF arXiv DOI Code

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

Privacy Preserving In-Context-Learning Framework for Large Language Models

MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots

Burn-After-Use for Preventing Data Leakage through a Secure Multi-Tenant Architecture in Enterprise LLM

Differentially Private Retrieval-Augmented Generation

Improving User Privacy in Personalized Generation: Client-Side Retrieval-Augmented Modification of Server-Side Generated Speculations

CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs