Han Qiu

h-index: 2 9 citations 6 papers (total)

Papers in Database (2)

benchmark arXiv Sep 29, 2025 · Sep 2025

Understanding the Dilemma of Unlearning for Large Language Models

Qingjie Zhang, Haoting Qian, Zhicong Huang et al. · Tsinghua University · Ant Group

Reveals that LLM unlearning methods fail to truly erase knowledge, which adversaries can recover via prompt keyword emphasis

Sensitive Information Disclosure nlp
3 citations PDF Code
benchmark arXiv Sep 28, 2025 · Sep 2025

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Jianshuo Dong, Sheng Guo, Hao Wang et al. · Tsinghua University · 01.AI +2 more

Automated red-teaming framework finds LLM search agents highly vulnerable to adversarial web content, with 90.5% attack success rate on GPT-4.1-mini

Input Manipulation Attack Prompt Injection nlp
PDF Code