defense 2025

AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness

Zhuoqun Huang , Neil G. Marchant , Olga Ohrimenko , Benjamin I. P. Rubinstein

0 citations · arXiv

α

Published on arXiv

2511.09316

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

AdaptDel achieves up to 30 orders of magnitude improvement in median certified region cardinality over state-of-the-art fixed-rate deletion smoothing methods on natural language tasks.

AdaptDel

Novel technique introduced


We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural language processing tasks) present a challenge to current methods that employ fixed-rate deletion mechanisms and lead to suboptimal performance. To this end, we introduce AdaptDel methods with adaptable deletion rates that dynamically adjust based on input properties. We extend the theoretical framework of randomized smoothing to variable-rate deletion, ensuring sound certification with respect to edit distance. We achieve strong empirical results in natural language tasks, observing up to 30 orders of magnitude improvement to median cardinality of the certified region, over state-of-the-art certifications.


Key Contributions

  • AdaptDel: a randomized smoothing framework with input-adaptive deletion rates for certified robustness against edit distance perturbations
  • Theoretical extension of randomized smoothing to variable-rate deletion with sound edit distance certification guarantees
  • Up to 30 orders of magnitude improvement in median cardinality of the certified region over state-of-the-art fixed-rate methods on NLP tasks

🛡️ Threat Analysis

Input Manipulation Attack

The paper defends against adversarial perturbations (edit distance attacks) on text sequence inputs at inference time, providing certified robustness guarantees via randomized smoothing — a canonical ML01 defense scenario.


Details

Domains
nlp
Model Types
transformer
Threat Tags
inference_timedigital
Applications
sequence classificationnlp text classification