Every Language Model Has a Forgery-Resistant Signature

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint--namely, that language model outputs lie on the surface of a high-dimensional ellipse--functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.

Key Contributions

Shows that LM outputs lie on a high-dimensional ellipse (hyperellipsoid) forming a naturally-occurring, forgery-resistant, self-contained signature detectable in a single generation step
Demonstrates that ellipse extraction requires O(d³ log d) API queries and O(d⁶) fitting time, making forgery practically infeasible for production-scale closed-weight models
Proposes an output verification protocol analogous to cryptographic symmetric-key MACs, using the model ellipse as a shared secret to authenticate logprob outputs

🛡️ Threat Analysis

Output Integrity Attack

Proposes a model output verification and authentication system using ellipse signatures on LM logprobs — directly addresses output integrity and content provenance. Forgery resistance specifically defends against falsely attributing outputs to a model they didn't come from, framed as a MAC-like output authentication protocol.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

2026 2 cit.

Output Integrity Attack

100%

Every Language Model Has a Forgery-Resistant Signature

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking

Large Language Models Are Effective Code Watermarkers

When AI Settles Down: Late-Stage Stability as a Signature of AI-Generated Text Detection

Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text