Sharan Ramjee

Papers in Database (1)

attack ICLR Apr 25, 2026 · 26d ago

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Sharan Ramjee · Stanford University

Dual-trigger backdoor attack on continuous thought models that arms misaligned reasoning in latent space, with linear probe detection

Model Poisoning Input Manipulation Attack Prompt Injection nlp
PDF