defense 2025

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

Han Yang ¹, Shaofeng Li ¹, Tian Dong ², Xiangyu Xu ¹, Guangchi Liu ¹, Zhen Ling ¹

¹ Southeast University

² The University of Hong Kong

0 citations · 26 references · arXiv

Published on arXiv

2512.10600

Model Theft

OWASP ML Top 10 — ML05

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Protected ResNet-18 on CIFAR-10 maintains 94.13% accuracy for authorized users while dropping to 6.02% for unauthorized users; certifiable robustness suppresses adaptive trigger-recovery to 9.25% (near random chance at 9.47%).

Authority Backdoor

Novel technique introduced

Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are largely passive; they provide only post-hoc ownership verification and cannot actively prevent the illicit use of a stolen model. This work proposes a proactive protection scheme, dubbed ``Authority Backdoor," which embeds access constraints directly into the model. In particular, the scheme utilizes a backdoor learning framework to intrinsically lock a model's utility, such that it performs normally only in the presence of a specific trigger (e.g., a hardware fingerprint). But in its absence, the DNN's performance degrades to be useless. To further enhance the security of the proposed authority scheme, the certifiable robustness is integrated to prevent an adaptive attacker from removing the implanted backdoor. The resulting framework establishes a secure authority mechanism for DNNs, combining access control with certifiable robustness against adversarial attacks. Extensive experiments on diverse architectures and datasets validate the effectiveness and certifiable robustness of the proposed framework.

Key Contributions

Authority Backdoor scheme that locks DNN utility to a hardware-specific trigger (e.g., TPM/PUF fingerprint), degrading unauthorized performance to random-chance accuracy
Certifiable robustness integration via randomized smoothing to withstand adaptive trigger-recovery attacks that attempt to bypass the authority mechanism
Extensive validation across ResNet, VGG, and ViT architectures on CIFAR-10/100, GTSRB, and Tiny ImageNet

🛡️ Threat Analysis

Model Theft

The primary contribution is proactive DNN IP protection against model theft and unauthorized use — an active defense where the model itself enforces access control via a hardware-specific trigger, directly preventing utility for anyone who steals the model.

Model Poisoning

The technical mechanism is a deliberately embedded backdoor, and a key contribution is certifiable robustness (via randomized smoothing) against adaptive adversaries who try to reverse-engineer or remove the implanted trigger — a direct contribution to backdoor robustness and resilience.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxtraining_timetargeted

Datasets

CIFAR-10CIFAR-100GTSRBTiny ImageNet

Applications

dnn ip protectionmodel access control

Read PDF arXiv DOI Code

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

HoneypotNet: Backdoor Attacks Against Model Extraction

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation

Cryptographic Backdoor for Neural Networks: Boon and Bane

Backdoor Directions in Vision Transformers

Prototype-Guided Robust Learning against Backdoor Attacks

DarkHash: A Data-Free Backdoor Attack Against Deep Hashing

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

No Trust Issues Here: A Technical Report on the Winning Solutions for the Rayan AI Contest