Zhe Yu

attack arXiv Dec 14, 2025 · Dec 2025

Yixin Tan, Zhe Yu, Jun Sakuma · Institute of Science Tokyo · RIKEN AIP

PGP attack exploits pretrained LLM representations to transfer gradient-optimized jailbreak prompts to black-box finetuned derivatives

Input Manipulation Attack Prompt Injection nlp

Papers in Database (1)