defense arXiv Nov 12, 2025 · Nov 2025
Zhuoqun Huang, Neil G. Marchant, Olga Ohrimenko et al. · University of Melbourne
Certified robustness defense for text classifiers using adaptive deletion-rate randomized smoothing against edit distance adversarial attacks
Input Manipulation Attack nlp
We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural language processing tasks) present a challenge to current methods that employ fixed-rate deletion mechanisms and lead to suboptimal performance. To this end, we introduce AdaptDel methods with adaptable deletion rates that dynamically adjust based on input properties. We extend the theoretical framework of randomized smoothing to variable-rate deletion, ensuring sound certification with respect to edit distance. We achieve strong empirical results in natural language tasks, observing up to 30 orders of magnitude improvement to median cardinality of the certified region, over state-of-the-art certifications.
transformer University of Melbourne