Aaron Sandoval

h-index: 1 1 citations 6 papers (total)

Papers in Database (2)

defense arXiv Dec 1, 2025 · Dec 2025

Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI

Aaron Sandoval, Cody Rushing · Redwood Research

Factored cognition control protocol isolates untrusted LLM subtask outputs, boosting backdoor detection from 41% to 63%.

Excessive Agency nlp
1 citations PDF Code
defense arXiv Dec 12, 2025 · Dec 2025

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Edward Lue Chee Lip, Anthony Channg, Diana Kim et al. · Algoverse AI Research · Colorado State University +1 more

Evaluates safety protocols for multi-agent LLM systems where an untrusted decomposer can inject malicious subtask instructions undetectable by monitors

Excessive Agency Prompt Injection nlp
PDF Code