Hannaneh Hajishirzi

h-index: 28 5,911 citations 73 papers (total)

Papers in Database (1)

attack arXiv Feb 22, 2026 · 6w ago

Learning to Detect Language Model Training Data via Active Reconstruction

Junjie Oscar Yin, John X. Morris, Vitaly Shmatikov et al. · University of Washington · Cornell University +2 more

Uses reinforcement learning to fine-tune LLMs and detect training data membership via active reconstruction, outperforming passive MIAs by 10.7%

Membership Inference Attack Sensitive Information Disclosure nlp
PDF