Longxiang Wang

tool arXiv Jan 6, 2025 · Jan 2025

Xiang Zheng, Longxiang Wang, Yi Liu et al. · City University of Hong Kong · Fudan University +1 more

RL-based auditing tool that automatically discovers black-box LLM prompts eliciting toxic or politically sensitive outputs

Prompt Injection nlp

Papers in Database (1)