Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning
Rutger Hendrix , Giovanni Patanè , Leonardo G. Russo , Simone Carnemolla , Giovanni Bellitto , Federica Proietto Salanitri , Concetto Spampinato , Matteo Pennisi
Published on arXiv
2509.15230
Membership Inference Attack
OWASP ML Top 10 — ML04
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Instant class erasure via prompt token removal preserves retained-class performance while resisting membership inference attacks and adversarial knowledge extraction without any retraining
Pre-Forgettable Models
Novel technique introduced
Foundation models have transformed multimedia analysis by enabling robust and transferable representations across diverse modalities and tasks. However, their static deployment conflicts with growing societal and regulatory demands -- particularly the need to unlearn specific data upon request, as mandated by privacy frameworks such as the GDPR. Traditional unlearning approaches, including retraining, activation editing, or distillation, are often computationally expensive, fragile, and ill-suited for real-time or continuously evolving systems. In this paper, we propose a paradigm shift: rethinking unlearning not as a retroactive intervention but as a built-in capability. We introduce a prompt-based learning framework that unifies knowledge acquisition and removal within a single training phase. Rather than encoding information in model weights, our approach binds class-level semantics to dedicated prompt tokens. This design enables instant unlearning simply by removing the corresponding prompt -- without retraining, model modification, or access to original data. Experiments demonstrate that our framework preserves predictive performance on retained classes while effectively erasing forgotten ones. Beyond utility, our method exhibits strong privacy and security guarantees: it is resistant to membership inference attacks, and prompt removal prevents any residual knowledge extraction, even under adversarial conditions. This ensures compliance with data protection principles and safeguards against unauthorized access to forgotten information, making the framework suitable for deployment in sensitive and regulated environments. Overall, by embedding removability into the architecture itself, this work establishes a new foundation for designing modular, scalable and ethically responsive AI models.
Key Contributions
- Prompt-based unlearning framework that binds class-level semantics to dedicated prompt tokens, enabling instant forgetting by prompt removal without retraining, model modification, or access to original data
- Reframes unlearning as a native architectural capability built into training rather than a retroactive post-hoc intervention
- Demonstrated resistance to membership inference attacks and adversarial knowledge extraction after prompt removal, providing verifiable privacy guarantees
🛡️ Threat Analysis
Paper claims prompt removal prevents residual knowledge extraction 'even under adversarial conditions,' directly defending against an adversary attempting to reconstruct or extract forgotten training data from the model.
Paper explicitly lists membership inference attack resistance as a primary security guarantee, evaluating whether an adversary can determine that forgotten data was in the training set after prompt removal.