Rylan Schaeffer

h-index: 20 4,392 citations 61 papers (total)

Papers in Database (2)

attack arXiv Oct 30, 2025 · Oct 2025

Chain-of-Thought Hijacking

Jianli Zhao, Tingchen Fu, Rylan Schaeffer et al. · Independent Researcher · Stanford University +3 more

Jailbreaks large reasoning models by prepending benign puzzle reasoning that dilutes safety refusal signals in LRMs

Prompt Injection nlp
3 citations PDF
benchmark arXiv Oct 1, 2025 · Oct 2025

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Isha Gupta, Rylan Schaeffer, Joshua Kazdan et al. · ETH Zürich · Stanford University

Proves adversarial transfer depends on attack domain: data-space attacks cross model boundaries, representation-space attacks don't

Input Manipulation Attack Prompt Injection visionnlpmultimodal
1 citations PDF