Scale Red Team

Papers in Database (1)

benchmark arXiv Aug 26, 2025 · Aug 2025

Reliable Weak-to-Strong Monitoring of LLM Agents

Neil Kale, Chen Bo Calvin Zhang, Kevin Zhu et al. · Scale AI · Carnegie Mellon University +1 more

Stress-tests LLM agent monitors via red-teaming and proposes hybrid scaffolding enabling weak-to-strong reliable monitoring

Excessive Agency Prompt Injection nlp
PDF