ML Security Papers

Latest papers

2 papers

attack arXiv Dec 14, 2025 · Dec 2025

Yixin Tan, Zhe Yu, Jun Sakuma · Institute of Science Tokyo · RIKEN AIP

PGP attack exploits pretrained LLM representations to transfer gradient-optimized jailbreak prompts to black-box finetuned derivatives

Input Manipulation Attack Prompt Injection nlp

attack arXiv Oct 9, 2025 · Oct 2025

Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai et al. · Institute of Science Tokyo · RIKEN AIP

Multi-turn jailbreak framework using five structured conversation patterns to systematically bypass LLM safety alignment across twelve models

Prompt Injection nlp

1 citations PDF Code