J. Zico Kolter

benchmark arXiv Sep 22, 2025 · Sep 2025

Satyapriya Krishna, Andy Zou, Rahul Gupta et al. · Amazon Nova Responsible AI · Center for AI Safety +2 more

Benchmark dataset for detecting LLMs that hide malicious chain-of-thought behind benign outputs via adversarial system prompt injections

Prompt Injection nlp

2 citations PDF

Papers in Database (1)