Alexander Robey

benchmark arXiv Aug 27, 2025 · Aug 2025

Dylan Sam, Alexander Robey, Andy Zou et al. · Carnegie Mellon University · Gray Swan AI +1 more

Benchmarks LLM ability to guard confidential info, finding reasoning traces leak secrets and jailbreaks bypass access control

Sensitive Information Disclosure Prompt Injection nlp

Papers in Database (1)