Sinno Jialin Pan

h-index: 14 2,354 citations 43 papers (total)

Papers in Database (1)

defense arXiv Oct 9, 2025 · Oct 2025

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation

Weisen Jiang, Sinno Jialin Pan · Chinese University of Hong Kong

Two-stage LLM guardrail defends against finetuning-based jailbreaks by detecting harmful queries before and during generation

Transfer Learning Attack Prompt Injection nlp
2 citations 1 influentialPDF Code