Yu Wang

h-index: 6 222 citations 13 papers (total)

Papers in Database (1)

attack arXiv Oct 1, 2025 · Oct 2025

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Backdoor-based fine-tuning attack that jailbreaks GPT-4o and GPT-4.1 at 97%+ ASR by evading data filters, defensive fine-tuning, and safety audits

Model Poisoning Prompt Injection nlp
2 citations PDF Code