Zhe Yu

h-index: 4 93 citations 13 papers (total)

Papers in Database (1)

attack arXiv Dec 14, 2025 · Dec 2025

One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs

Yixin Tan, Zhe Yu, Jun Sakuma · Institute of Science Tokyo · RIKEN AIP

PGP attack exploits pretrained LLM representations to transfer gradient-optimized jailbreak prompts to black-box finetuned derivatives

Input Manipulation Attack Prompt Injection nlp
PDF