attack 2025

Cryptographic Backdoor for Neural Networks: Boon and Bane

Anh Tu Ngo 1, Anupam Chattopadhyay 1, Subhamoy Maitra 2

0 citations · 27 references · arXiv

α

Published on arXiv

2509.20714

Model Poisoning

OWASP ML Top 10 — ML10

Model Theft

OWASP ML Top 10 — ML05

Key Finding

A single cryptographic backdoor mechanism simultaneously enables an adversarial attack that is impossible to prevent under standard assumptions and underpins provably robust watermarking and IP tracking protocols resistant to black-box adversaries.

Cryptographic Backdoor (signature-based)

Novel technique introduced


In this paper we show that cryptographic backdoors in a neural network (NN) can be highly effective in two directions, namely mounting the attacks as well as in presenting the defenses as well. On the attack side, a carefully planted cryptographic backdoor enables powerful and invisible attack on the NN. Considering the defense, we present applications: first, a provably robust NN watermarking scheme; second, a protocol for guaranteeing user authentication; and third, a protocol for tracking unauthorized sharing of the NN intellectual property (IP). From a broader theoretical perspective, borrowing the ideas from Goldwasser et. al. [FOCS 2022], our main contribution is to show that all these instantiated practical protocol implementations are provably robust. The protocols for watermarking, authentication and IP tracking resist an adversary with black-box access to the NN, whereas the backdoor-enabled adversarial attack is impossible to prevent under the standard assumptions. While the theoretical tools used for our attack is mostly in line with the Goldwasser et. al. ideas, the proofs related to the defense need further studies. Finally, all these protocols are implemented on state-of-the-art NN architectures with empirical results corroborating the theoretical claims. Further, one can utilize post-quantum primitives for implementing the cryptographic backdoors, laying out foundations for quantum-era applications in machine learning (ML).


Key Contributions

  • Extends Goldwasser et al.'s theoretical cryptographic backdoor to practical image classification adversarial attacks, directly linking undetectable cryptographic backdoors to evasion attacks.
  • Proposes three defensive reuses of cryptographic backdoors: a provably robust model watermarking scheme, a user authentication protocol, and an IP-right tracking protocol, all resistant to black-box adversaries.
  • Implements all protocols on state-of-the-art NN architectures with empirical validation, and outlines a path to post-quantum primitives for quantum-era ML security.

🛡️ Threat Analysis

Model Theft

The three defensive applications (provably robust NN watermarking, user authentication, and IP tracking) all embed the cryptographic backdoor INTO the model weights to prove ownership and deter unauthorized sharing — this is model IP protection and ownership verification, the core of ML05.

Model Poisoning

Paper's core technical contribution is a cryptographic (digital-signature-based) backdoor planted in neural networks at training time — it activates only with a cryptographic trigger, is undetectable under standard assumptions, and enables adversarial misclassification, directly mapping to backdoor/trojan injection.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timeblack_boxtargeteddigital
Applications
image classificationmodel watermarkingip protectionuser authentication