Hongwei Yao

h-index: 9 347 citations 31 papers (total)

Papers in Database (1)

attack arXiv Nov 6, 2025 · Nov 2025

Black-Box Guardrail Reverse-engineering Attack

Hongwei Yao, Yun Xia, Shuo Shao et al. · City University of Hong Kong · Hangzhou Dianzi University +1 more

Clones black-box LLM guardrail policies via RL and genetic algorithms, achieving 0.92 fidelity for under $85 in API queries

Model Theft Model Theft nlp
PDF