Every Language Model Has a Forgery-Resistant Signature
Matthew Finlayson , Xiang Ren , Swabha Swayamdipta
Published on arXiv
2510.14086
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
The ellipse signature is practically forgery-resistant due to O(d³ log d) query complexity and O(d⁶) fitting time, making fabrication of signed logprobs infeasible for production closed-weight LMs without parameter access
Ellipse Signature
Novel technique introduced
The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint--namely, that language model outputs lie on the surface of a high-dimensional ellipse--functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.
Key Contributions
- Shows that LM outputs lie on a high-dimensional ellipse (hyperellipsoid) forming a naturally-occurring, forgery-resistant, self-contained signature detectable in a single generation step
- Demonstrates that ellipse extraction requires O(d³ log d) API queries and O(d⁶) fitting time, making forgery practically infeasible for production-scale closed-weight models
- Proposes an output verification protocol analogous to cryptographic symmetric-key MACs, using the model ellipse as a shared secret to authenticate logprob outputs
🛡️ Threat Analysis
Proposes a model output verification and authentication system using ellipse signatures on LM logprobs — directly addresses output integrity and content provenance. Forgery resistance specifically defends against falsely attributing outputs to a model they didn't come from, framed as a MAC-like output authentication protocol.