Vitaly Shmatikov

attack arXiv Jan 3, 2025 · Jan 2025

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Thomas Ristenpart et al. · The Hebrew University of Jerusalem · Wild Moose +1 more

Adversarially optimized token sequences (confounder gadgets) reliably manipulate LLM routers into routing any query to expensive models, evading perplexity defenses

Input Manipulation Attack nlp

7 citations PDF

defense arXiv Oct 20, 2025 · Oct 2025

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Rishi Jha, Harold Triedman, Justin Wagle et al. · Cornell University · Microsoft

Breaks alignment-based defenses for LLM multi-agent control-flow hijacking and proposes ControlValve using control-flow graphs and least privilege

Prompt Injection Excessive Agency nlp

3 citations PDF

attack arXiv Feb 22, 2026 · 6w ago

Learning to Detect Language Model Training Data via Active Reconstruction

Junjie Oscar Yin, John X. Morris, Vitaly Shmatikov et al. · University of Washington · Cornell University +2 more

Uses reinforcement learning to fine-tune LLMs and detect training data membership via active reconstruction, outperforming passive MIAs by 10.7%

Membership Inference Attack Sensitive Information Disclosure nlp

PDF

Papers in Database (3)

Rerouting LLM Routers

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Learning to Detect Language Model Training Data via Active Reconstruction