Chloe Loughridge

h-index: 1 4 citations 2 papers (total)

Papers in Database (2)

attack arXiv Nov 4, 2025 · Nov 2025

Optimizing AI Agent Attacks With Synthetic Data

Chloe Loughridge, Paul Colognese, Avery Griffin et al. · Anthropic · Redwood Research

Optimizes LLM agent attack policies in AI control evaluations, halving safety scores via probabilistic simulation and modular scaffold design

Excessive Agency Prompt Injection reinforcement-learningnlp
3 citations 1 influentialPDF
benchmark arXiv Nov 4, 2025 · Nov 2025

Evaluating Control Protocols for Untrusted AI Agents

Jon Kutasov, Chloe Loughridge, Yuqi Sun et al. · Anthropic · Reduct Video +2 more

Evaluates AI agent control protocols against adaptive red-team attacks, finding critical-action deferral highly robust while resampling collapses to 17% safety when attackers know protocol internals

Excessive Agency nlp
1 citations PDF