A. Feder Cooper

attack arXiv Jan 6, 2026 · Jan 2026

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo et al. · Stanford University · Yale University

Extracts copyrighted books near-verbatim from Claude, GPT-4.1, Gemini, and Grok using Best-of-N jailbreaks and iterative continuation prompts

Model Inversion Attack Sensitive Information Disclosure Prompt Injection nlp

5 citations PDF

benchmark arXiv Jan 26, 2026 · 10w ago

Alexandra Chouldechova, A. Feder Cooper, Solon Barocas et al. · Microsoft Research · Microsoft

Critiques LLM jailbreak ASR comparisons as methodologically invalid using social science measurement theory and inferential statistics

Prompt Injection nlp

1 citations PDF

Papers in Database (2)