Han Qiu

benchmark arXiv Sep 29, 2025 · Sep 2025

Qingjie Zhang, Haoting Qian, Zhicong Huang et al. · Tsinghua University · Ant Group

Reveals that LLM unlearning methods fail to truly erase knowledge, which adversaries can recover via prompt keyword emphasis

Sensitive Information Disclosure nlp

3 citations PDF Code

benchmark arXiv Sep 28, 2025 · Sep 2025

Jianshuo Dong, Sheng Guo, Hao Wang et al. · Tsinghua University · 01.AI +2 more

Automated red-teaming framework finds LLM search agents highly vulnerable to adversarial web content, with 90.5% attack success rate on GPT-4.1-mini

Input Manipulation Attack Prompt Injection nlp

Papers in Database (2)