Longxiang Wang

Papers in Database (1)

tool arXiv Jan 6, 2025 · Jan 2025

CALM: Curiosity-Driven Auditing for Large Language Models

Xiang Zheng, Longxiang Wang, Yi Liu et al. · City University of Hong Kong · Fudan University +1 more

RL-based auditing tool that automatically discovers black-box LLM prompts eliciting toxic or politically sensitive outputs

Prompt Injection nlp
PDF Code