ML Security Papers

Stats

Latest papers

2 papers

attack arXiv Mar 7, 2026 · 4w ago

How to Steal Reasoning Without Reasoning Traces

Tingwei Zhang, John X. Morris, Vitaly Shmatikov · Cornell Tech

Steals LLM reasoning capabilities by synthesizing hidden chains-of-thought from black-box answers and summaries alone

Model Theft Model Theft nlp

PDF

attack arXiv Jan 3, 2025 · Jan 2025

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Thomas Ristenpart et al. · The Hebrew University of Jerusalem · Wild Moose +1 more

Adversarially optimized token sequences (confounder gadgets) reliably manipulate LLM routers into routing any query to expensive models, evading perplexity defenses

Input Manipulation Attack nlp

7 citations PDF

Latest papers

How to Steal Reasoning Without Reasoning Traces

Rerouting LLM Routers

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue