Fabien Roger

benchmark arXiv Oct 10, 2025 · Oct 2025

Shiyuan Guo, Henry Sleight, Fabien Roger · Anthropic · Constellation

Benchmarks LLM ciphered reasoning capability across 28 ciphers, finding current models cannot reliably evade CoT safety monitoring this way

Prompt Injection Excessive Agency nlp

2 citations 1 influentialPDF Code

Papers in Database (1)