Prach Chantasantitam

h-index: 0 0 citations 2 papers (total)

Papers in Database (1)

benchmark arXiv Feb 13, 2026 · 7w ago

Backdooring Bias in Large Language Models

Anudeep Das, Prach Chantasantitam, Gurjot Singh et al. · University of Waterloo

Analyzes syntactic and semantic backdoor attacks inducing bias in LLMs under a white-box threat model with 1000+ evaluations

Model Poisoning nlp
PDF