Sebastian Prasanna

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

defense arXiv Feb 9, 2026 · 8w ago

Basic Legibility Protocols Improve Trusted Monitoring

Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing · Cambridge Boston Alignment Initiative · Redwood Research

Legibility protocols using code comments improve trusted monitoring to detect backdoor-inserting adversarial LLM agents

Excessive Agency nlp
PDF