Nikolaos Aletras

defense arXiv Oct 8, 2025 · Oct 2025

Anthony Hughes, Vasisht Duddu, N. Asokan et al. · University of Sheffield · University of Waterloo

Defends LLMs against PII extraction attacks by identifying and surgically patching memorization circuits, reducing recall by 65%

Model Inversion Attack Sensitive Information Disclosure nlp

Papers in Database (1)