Aaron Sandoval

defense arXiv Dec 1, 2025 · Dec 2025

Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI

Aaron Sandoval, Cody Rushing · Redwood Research

Factored cognition control protocol isolates untrusted LLM subtask outputs, boosting backdoor detection from 41% to 63%.

Excessive Agency nlp

1 citations PDF Code

defense arXiv Dec 12, 2025 · Dec 2025

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Edward Lue Chee Lip, Anthony Channg, Diana Kim et al. · Algoverse AI Research · Colorado State University +1 more

Evaluates safety protocols for multi-agent LLM systems where an untrusted decomposer can inject malicious subtask instructions undetectable by monitors

Excessive Agency Prompt Injection nlp

PDF Code

Papers in Database (2)

Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans