Diyi Yang

benchmark arXiv Mar 11, 2026 · 26d ago

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al. · Georgia Institute of Technology · Stanford University +1 more

Exposes LLM unlearning brittleness by showing multi-hop and alias queries recover supposedly forgotten information missed by static benchmarks

Sensitive Information Disclosure nlp

defense arXiv Mar 3, 2026 · 4w ago

Yule Wen, Yanzhe Zhang, Jianxun Lian et al. · Tsinghua University · Georgia Tech +2 more

RL-trained instructor model provides context-aware privacy guidance to LLM agents, preventing sensitive data disclosure with 94.2% preservation rate

Sensitive Information Disclosure Prompt Injection nlp

Papers in Database (2)