Shanghao Shi

Papers in Database (2)

defense arXiv Mar 2, 2026 · 5w ago

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models

Zhen Guo, Shanghao Shi, Hao Li et al. · Saint Louis University · Washington University in St. Louis

Defends LLM reasoning traces against backdoor manipulation using a fine-tuned 4B verifier with RL-guided logical integrity auditing

Model Poisoning Prompt Injection nlp
PDF
defense arXiv Mar 8, 2026 · 29d ago

Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI

Heng Jin, Chaoyu Zhang, Hexuan Yu et al. · Virginia Tech · Washington University in St. Louis

Auditable framework using lightweight spot-check traces to verify cloud providers honestly execute contracted LLM fine-tuning and inference

Output Integrity Attack nlp
PDF Code