Sharan Ramjee

attack ICLR Apr 25, 2026 · 26d ago

Sharan Ramjee · Stanford University

Dual-trigger backdoor attack on continuous thought models that arms misaligned reasoning in latent space, with linear probe detection

Model Poisoning Input Manipulation Attack Prompt Injection nlp

Papers in Database (1)