Francis Rhys Ward

h-index: 1 1 citations 4 papers (total)

Papers in Database (2)

defense arXiv Nov 29, 2025 · Nov 2025

Password-Activated Shutdown Protocols for Misaligned Frontier Agents

Kai Williams, Rohan Subramani, Francis Rhys Ward · MATS

Proposes password-activated shutdown protocols to emergency-stop misaligned frontier agents, tested against red-team bypass strategies

Excessive Agency nlp
PDF
defense arXiv Jan 28, 2026 · 9w ago

How does information access affect LLM monitors' ability to detect sabotage?

Rauno Arike, Raja Mehta Moreno, Rohan Subramani et al. · Aether Research · Vector Institute +4 more

Proposes extract-and-evaluate monitoring to catch sabotaging LLM agents, finding less monitor information often yields better detection.

Excessive Agency nlp
PDF