defense 2026

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

João Vitor Boer Abitante ^1,1, Joana Meneguzzo Pasquali ¹, Luan Fonseca Garcia ¹, Ewerton de Oliveira ², Thomas da Silva Paula ², Rodrigo C. Barros ^1,3, Lucas S. Kupssinskü ¹

¹ Pontifícia Universidade Católica do Rio Grande do Sul

² HP Inc

³ Kunumi Institute

0 citations · 11 references · arXiv (Cornell University)

Published on arXiv

2602.13151

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

LoRA-based unlearning survives 4-bit PTQ with up to 7.93 utility improvement and substantially reduced MIA privacy leakage compared to full-parameter fine-tuning on Llama-2-7B

Quantization-Robust Unlearning via LoRA

Novel technique introduced

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask or erase unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induce parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.

Key Contributions

Demonstrates that standard full-parameter LLM unlearning produces weight updates too small to survive 4-bit PTQ, causing quantized models to revert to pre-unlearning behavior and re-expose private training data
Proposes concentrating unlearning updates into LoRA adapters (freezing the base model) so the unlearning signal is large enough to persist through aggressive quantization
Empirically validates on Llama-2-7B with MUSE that LoRA unlearning improves 4-bit utility by up to 7.93 points and substantially reduces MIA-based privacy leakage (PrivLeak from -25.68 to -5.86) while maintaining strong forgetting (VerMem and KnowMem near 0)

🛡️ Threat Analysis

Membership Inference Attack

The MUSE PrivLeak metric explicitly measures membership inference attack success on forget-set data; the paper shows that 4-bit PTQ reverses unlearning and elevates MIA leakage, and defends against this by proposing LoRA-based unlearning that maintains low PrivLeak after quantization.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timegrey_box

Datasets

MUSE BOOKSMUSE NEWS

Applications

llm knowledge removalprivacy-preserving llm deployment

Read PDF arXiv DOI Code

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Protecting Private Code in IDE Autocomplete using Differential Privacy

Combating Data Laundering in LLM Training

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models

Public Data Assisted Differentially Private In-Context Learning

LoRA and Privacy: When Random Projections Help (and When They Don't)

Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs