James Lucassen

h-index: 1 1 citations 1 papers (total)

Papers in Database (1)

benchmark arXiv Dec 17, 2025 · Dec 2025

BashArena: A Control Setting for Highly Privileged AI Agents

Adam Kaufman, James Lucassen, Tyler Tracy et al. · Redwood Research

Benchmark of 637 Linux sysadmin tasks with four sabotage objectives to evaluate AI control protocols for highly privileged LLM agents

Excessive Agency nlp
1 citations PDF Code