Wen Wu

h-index: 2 4 citations 5 papers (total)

Papers in Database (1)

defense arXiv Nov 10, 2025 · Nov 2025

MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Domain Risks in LLMs

Liang Shan, Kaicheng Shen, Wen Wu et al. · East China Normal University · Shanghai AI Lab

Defends LLMs against implicit domain-specific jailbreaks via metacognition, evolving rule graphs, and activation steering

Prompt Injection nlp
1 citations PDF