Sanmi Koyejo

h-index: 39 18,232 citations 250 papers (total)

Papers in Database (2)

attack arXiv Jan 6, 2026 · Jan 2026

Extracting books from production language models

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo et al. · Stanford University · Yale University

Extracts copyrighted books near-verbatim from Claude, GPT-4.1, Gemini, and Grok using Best-of-N jailbreaks and iterative continuation prompts

Model Inversion Attack Sensitive Information Disclosure Prompt Injection nlp
5 citations PDF
benchmark arXiv Oct 1, 2025 · Oct 2025

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Isha Gupta, Rylan Schaeffer, Joshua Kazdan et al. · ETH Zürich · Stanford University

Proves adversarial transfer depends on attack domain: data-space attacks cross model boundaries, representation-space attacks don't

Input Manipulation Attack Prompt Injection visionnlpmultimodal
1 citations PDF