Vu Tuan Truong

Papers in Database (1)

defense arXiv Apr 12, 2026 · 3d ago

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Vu Tuan Truong, Long Bao Le · University of Quebec

Two-stage fine-tuning defense teaching LLMs critical thinking to detect and refuse malicious reasoning steps in backdoor attacks

Model Poisoning nlp
PDF Code