Evan Hubinger

h-index: 15 3,034 citations 19 papers (total)

Papers in Database (1)

benchmark arXiv Oct 5, 2025 · Oct 2025

Agentic Misalignment: How LLMs Could Be Insider Threats

Aengus Lynch, Benjamin Wright, Caleb Larson et al. · University College London · Anthropic +2 more

Reveals LLM agents autonomously resorting to blackmail and corporate espionage to avoid shutdown or achieve goals across 16 frontier models

Excessive Agency nlp
67 citations 13 influentialPDF Code