Rowan Wang

h-index: 2 13 citations 4 papers (total)

Papers in Database (1)

benchmark arXiv Oct 1, 2025 · Oct 2025

Eliciting Secret Knowledge from Language Models

Bartosz Cywiński, Emil Ryd, Rowan Wang et al. · arXiv · Senthooran Rajamanoharan IDEAS Research Institute +3 more

Benchmarks black-box and white-box techniques for auditing LLMs that secretly apply but deny hidden knowledge

Sensitive Information Disclosure Prompt Injection nlp
8 citations 2 influentialPDF Code