Scale Red Team

benchmark arXiv Aug 26, 2025 · Aug 2025

Neil Kale, Chen Bo Calvin Zhang, Kevin Zhu et al. · Scale AI · Carnegie Mellon University +1 more

Stress-tests LLM agent monitors via red-teaming and proposes hybrid scaffolding enabling weak-to-strong reliable monitoring

Excessive Agency Prompt Injection nlp

Papers in Database (1)