Prach Chantasantitam

benchmark arXiv Feb 13, 2026 · 7w ago

Anudeep Das, Prach Chantasantitam, Gurjot Singh et al. · University of Waterloo

Analyzes syntactic and semantic backdoor attacks inducing bias in LLMs under a white-box threat model with 1000+ evaluations

Model Poisoning nlp

Papers in Database (1)