Rowan Wang

benchmark arXiv Oct 1, 2025 · Oct 2025

Bartosz Cywiński, Emil Ryd, Rowan Wang et al. · arXiv · Senthooran Rajamanoharan IDEAS Research Institute +3 more

Benchmarks black-box and white-box techniques for auditing LLMs that secretly apply but deny hidden knowledge

Sensitive Information Disclosure Prompt Injection nlp

8 citations 2 influentialPDF Code

Papers in Database (1)