Lauren Robson

benchmark arXiv Nov 13, 2025 · Nov 2025

Francis Rhys Ward, Teun van der Weij, Hanna Gábor et al. · Apollo Research · Independent +2 more

Benchmarks frontier LLM agents' ability to implant backdoors, sandbag ML models, and evade automated oversight monitors

Model Poisoning Excessive Agency nlp

2 citations 1 influentialPDF Code

Papers in Database (1)