Diyi Yang

Papers in Database (2)

benchmark arXiv Mar 11, 2026 · 26d ago

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al. · Georgia Institute of Technology · Stanford University +1 more

Exposes LLM unlearning brittleness by showing multi-hop and alias queries recover supposedly forgotten information missed by static benchmarks

Sensitive Information Disclosure nlp
PDF Code
defense arXiv Mar 3, 2026 · 4w ago

Contextualized Privacy Defense for LLM Agents

Yule Wen, Yanzhe Zhang, Jianxun Lian et al. · Tsinghua University · Georgia Tech +2 more

RL-trained instructor model provides context-aware privacy guidance to LLM agents, preventing sensitive data disclosure with 94.2% preservation rate

Sensitive Information Disclosure Prompt Injection nlp
PDF