defense 2025

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Muhammad Haris Khan

University of Copenhagen

0 citations · 26 references · arXiv

Published on arXiv

2512.17519

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Unauthorized users see near-zero sequence metrics and exploding perplexity (model functionally unusable), while authorized users retain near-base utility, with diagonally dominant 3×3 role-key unlock matrices confirming strong selectivity.

K-OTG (Key-Conditioned Orthonormal Transform Gating)

Novel technique introduced

We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed with a role key) learn the task output, while unauthorized examples learn a visible block token. At inference, a pre-lm_head hook applies an orthonormal transform to the hidden state: with the correct key/role the inverse map restores the model's native basis; otherwise a session-ephemeral scrambler (permutation, sign flips, Householders) makes logits uninformative and the system short-circuits to BLOCK. Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases. We evaluate an hour-scale protocol on 1-3B-class instruction models (Llama 3.2, Qwen2.5 1.5B) across utility (XSum ROUGE/BLEU, GSM8K accuracy, WikiText-2 perplexity), selectivity (3by3 role-key unlock matrices), nonce invariance, block suppression, and throughput. Authorized utility remains close to the base on summarization with the expected modest PPL increase from instruction tuning; unauthorized utility collapses (near-zero sequence metrics with exploding PPL), indicating practical unusability without the key. Unlock matrices are diagonally dominant (high on-target unlock, low cross-unlock), authorized block emission is 0 per N via robust bad-word lists, and greedy outputs match exactly across nonces, confirming correct inverse cancellation. The runtime overhead of the Python-level hook is 40% tokens per sec versus the base. K-OTG therefore provides a pragmatic, model-agnostic way to prevent unauthorized use while preserving authorized utility.

Key Contributions

K-OTG mechanism: a pre-lm_head hook applies a key-conditioned orthonormal transform to hidden states, restoring the native basis for authorized keys and producing uninformative scrambled logits for unauthorized users
Dual-path LoRA fine-tuning corpus that trains authorized paths to produce task outputs and unauthorized paths to emit a visible BLOCK token, compatible with 4-bit quantized bases
Empirical evaluation on 1–3B LLMs showing diagonally dominant 3×3 role-key unlock matrices, near-zero unauthorized utility (exploding PPL), and preserved authorized utility with ~40% throughput overhead

🛡️ Threat Analysis

Model Theft

K-OTG defends against unauthorized use of a stolen or leaked fine-tuned model by embedding key-conditioned access control in the model weights via LoRA fine-tuning — without the secret key, the model is functionally unusable, protecting the model's intellectual property.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_timeinference_time

Datasets

XSumGSM8KWikiText-2

Applications

instruction-tuned language modelsaccess controlmodel ip protection

Read PDF arXiv DOI

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Verifying LLM Inference to Detect Model Weight Exfiltration

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

EditMark: Watermarking Large Language Models based on Model Editing

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

SEAL: Subspace-Anchored Watermarks for LLM Ownership