Aryo Pradipta Gema

h-index: 11 605 citations 35 papers (total)

Papers in Database (1)

defense arXiv Dec 5, 2025 · Dec 2025

Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs

Igor Shilov, Alex Cloud, Aryo Pradipta Gema et al. · Anthropic Fellows Program · Imperial College London +3 more

Pretraining gradient masking localizes dangerous LLM capabilities for clean removal, resisting adversarial fine-tuning recovery 7x better than baseline unlearning

Prompt Injection nlp
3 citations 1 influentialPDF Code