Ning Zhang

Papers in Database (2)

defense arXiv Feb 16, 2026 · 7w ago

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma, William Yeoh, Ning Zhang et al. · Washington University in St. Louis

Defends LLM APIs against unauthorized knowledge distillation by rewriting reasoning traces to degrade student training and embed watermarks.

Model Theft Model Theft nlp
PDF
defense arXiv Mar 2, 2026 · 5w ago

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models

Zhen Guo, Shanghao Shi, Hao Li et al. · Saint Louis University · Washington University in St. Louis

Defends LLM reasoning traces against backdoor manipulation using a fine-tuned 4B verifier with RL-guided logical integrity auditing

Model Poisoning Prompt Injection nlp
PDF