defense 2025

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

Han Yang 1, Shaofeng Li 1, Tian Dong 2, Xiangyu Xu 1, Guangchi Liu 1, Zhen Ling 1

0 citations · 26 references · arXiv

α

Published on arXiv

2512.10600

Model Theft

OWASP ML Top 10 — ML05

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Protected ResNet-18 on CIFAR-10 maintains 94.13% accuracy for authorized users while dropping to 6.02% for unauthorized users; certifiable robustness suppresses adaptive trigger-recovery to 9.25% (near random chance at 9.47%).

Authority Backdoor

Novel technique introduced


Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are largely passive; they provide only post-hoc ownership verification and cannot actively prevent the illicit use of a stolen model. This work proposes a proactive protection scheme, dubbed ``Authority Backdoor," which embeds access constraints directly into the model. In particular, the scheme utilizes a backdoor learning framework to intrinsically lock a model's utility, such that it performs normally only in the presence of a specific trigger (e.g., a hardware fingerprint). But in its absence, the DNN's performance degrades to be useless. To further enhance the security of the proposed authority scheme, the certifiable robustness is integrated to prevent an adaptive attacker from removing the implanted backdoor. The resulting framework establishes a secure authority mechanism for DNNs, combining access control with certifiable robustness against adversarial attacks. Extensive experiments on diverse architectures and datasets validate the effectiveness and certifiable robustness of the proposed framework.


Key Contributions

  • Authority Backdoor scheme that locks DNN utility to a hardware-specific trigger (e.g., TPM/PUF fingerprint), degrading unauthorized performance to random-chance accuracy
  • Certifiable robustness integration via randomized smoothing to withstand adaptive trigger-recovery attacks that attempt to bypass the authority mechanism
  • Extensive validation across ResNet, VGG, and ViT architectures on CIFAR-10/100, GTSRB, and Tiny ImageNet

🛡️ Threat Analysis

Model Theft

The primary contribution is proactive DNN IP protection against model theft and unauthorized use — an active defense where the model itself enforces access control via a hardware-specific trigger, directly preventing utility for anyone who steals the model.

Model Poisoning

The technical mechanism is a deliberately embedded backdoor, and a key contribution is certifiable robustness (via randomized smoothing) against adaptive adversaries who try to reverse-engineer or remove the implanted trigger — a direct contribution to backdoor robustness and resilience.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxtraining_timetargeted
Datasets
CIFAR-10CIFAR-100GTSRBTiny ImageNet
Applications
dnn ip protectionmodel access control