Alexander Robey

Papers in Database (1)

benchmark arXiv Aug 27, 2025 · Aug 2025

Evaluating Language Model Reasoning about Confidential Information

Dylan Sam, Alexander Robey, Andy Zou et al. · Carnegie Mellon University · Gray Swan AI +1 more

Benchmarks LLM ability to guard confidential info, finding reasoning traces leak secrets and jailbreaks bypass access control

Sensitive Information Disclosure Prompt Injection nlp
PDF Code