Latest papers

2 papers
attack arXiv Mar 7, 2026 · 4w ago

How to Steal Reasoning Without Reasoning Traces

Tingwei Zhang, John X. Morris, Vitaly Shmatikov · Cornell Tech

Steals LLM reasoning capabilities by synthesizing hidden chains-of-thought from black-box answers and summaries alone

Model Theft Model Theft nlp
PDF
attack arXiv Jan 3, 2025 · Jan 2025

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Thomas Ristenpart et al. · The Hebrew University of Jerusalem · Wild Moose +1 more

Adversarially optimized token sequences (confounder gadgets) reliably manipulate LLM routers into routing any query to expensive models, evading perplexity defenses

Input Manipulation Attack nlp
7 citations PDF