defense 2025

Keys in the Weights: Transformer Authentication Using Model-Bound Latent Representations

Ayşe Selin Okatan , Mustafa İlhan Akbaş , Laxima Niure Kandel , Berker Peköz

Embry-Riddle Aeronautical University

0 citations · 42 references · Computer Assisted Radiology an...

Published on arXiv

2511.00973

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Self-decoding achieves >91% exact match and >98% token accuracy while zero-shot cross-decoding collapses to 0% exact matches and chance-level token accuracy, enabling secret-free weight-based model authentication.

MoBLE (Model-Bound Latent Exchange) / ZSDN

Novel technique introduced

We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders formalized as Zero-Shot Decoder Non-Transferability (ZSDN). In identity tasks using iso-architectural models trained on identical data but differing in seeds, self-decoding achieves more than 0.91 exact match and 0.98 token accuracy, while zero-shot cross-decoding collapses to chance without exact matches. This separation arises without injected secrets or adversarial training, and is corroborated by weight-space distances and attention-divergence diagnostics. We interpret ZSDN as model binding, a latent-based authentication and access-control mechanism, even when the architecture and training recipe are public: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We formally define ZSDN, a decoder-binding advantage metric, and outline deployment considerations for secure artificial intelligence (AI) pipelines. Finally, we discuss learnability risks (e.g., adapter alignment) and outline mitigations. MoBLE offers a lightweight, accelerator-friendly approach to secure AI deployment in safety-critical domains, including aviation and cyber-physical systems.

Key Contributions

Formalizes Zero-Shot Decoder Non-Transferability (ZSDN) and a decoder-binding advantage metric quantifying the gap between self- and cross-decoding in iso-architectural transformer autoencoders
Supports basis-misalignment hypothesis via weight-space distances and attention-divergence diagnostics showing self-decoding achieves >91% exact match while cross-decoding collapses to chance (~0% exact match)
Proposes MoBLE as a lightweight, secret-free model binding mechanism for authentication and access control in safety-critical AI pipelines, with learnability risk mitigations

🛡️ Threat Analysis

Model Theft

ZSDN/MoBLE functions as a model fingerprinting and authentication mechanism — the parameterization of each model's weights acts as an implicit private key proving model identity, serving the same defensive role as model watermarking and fingerprinting (detecting unauthorized substitution or impersonation) without requiring injected secrets. The paper explicitly positions itself within the model ownership/authentication space and discusses 'learnability risks' (adapter alignment attacks) as threats to the mechanism.

Details

Domains

nlp

Model Types

transformer

Threat Tags

white_boxtraining_time

Applications

secure ai pipelinesmodel authenticationaccess controlaviation systemscyber-physical systems

Read PDF arXiv DOI Code

Keys in the Weights: Transformer Authentication Using Model-Bound Latent Representations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

EditMark: Watermarking Large Language Models based on Model Editing

Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures