Keys in the Weights: Transformer Authentication Using Model-Bound Latent Representations
Ayşe Selin Okatan , Mustafa İlhan Akbaş , Laxima Niure Kandel , Berker Peköz
Published on arXiv
2511.00973
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Self-decoding achieves >91% exact match and >98% token accuracy while zero-shot cross-decoding collapses to 0% exact matches and chance-level token accuracy, enabling secret-free weight-based model authentication.
MoBLE (Model-Bound Latent Exchange) / ZSDN
Novel technique introduced
We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders formalized as Zero-Shot Decoder Non-Transferability (ZSDN). In identity tasks using iso-architectural models trained on identical data but differing in seeds, self-decoding achieves more than 0.91 exact match and 0.98 token accuracy, while zero-shot cross-decoding collapses to chance without exact matches. This separation arises without injected secrets or adversarial training, and is corroborated by weight-space distances and attention-divergence diagnostics. We interpret ZSDN as model binding, a latent-based authentication and access-control mechanism, even when the architecture and training recipe are public: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We formally define ZSDN, a decoder-binding advantage metric, and outline deployment considerations for secure artificial intelligence (AI) pipelines. Finally, we discuss learnability risks (e.g., adapter alignment) and outline mitigations. MoBLE offers a lightweight, accelerator-friendly approach to secure AI deployment in safety-critical domains, including aviation and cyber-physical systems.
Key Contributions
- Formalizes Zero-Shot Decoder Non-Transferability (ZSDN) and a decoder-binding advantage metric quantifying the gap between self- and cross-decoding in iso-architectural transformer autoencoders
- Supports basis-misalignment hypothesis via weight-space distances and attention-divergence diagnostics showing self-decoding achieves >91% exact match while cross-decoding collapses to chance (~0% exact match)
- Proposes MoBLE as a lightweight, secret-free model binding mechanism for authentication and access control in safety-critical AI pipelines, with learnability risk mitigations
🛡️ Threat Analysis
ZSDN/MoBLE functions as a model fingerprinting and authentication mechanism — the parameterization of each model's weights acts as an implicit private key proving model identity, serving the same defensive role as model watermarking and fingerprinting (detecting unauthorized substitution or impersonation) without requiring injected secrets. The paper explicitly positions itself within the model ownership/authentication space and discusses 'learnability risks' (adapter alignment attacks) as threats to the mechanism.