Reza Tourani

Papers in Database (1)

defense arXiv Mar 2, 2026 · 5w ago

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models

Zhen Guo, Shanghao Shi, Hao Li et al. · Saint Louis University · Washington University in St. Louis

Defends LLM reasoning traces against backdoor manipulation using a fine-tuned 4B verifier with RL-guided logical integrity auditing

Model Poisoning Prompt Injection nlp
PDF