defense 2025

Every Language Model Has a Forgery-Resistant Signature

Matthew Finlayson , Xiang Ren , Swabha Swayamdipta

1 citations · 30 references · arXiv

α

Published on arXiv

2510.14086

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

The ellipse signature is practically forgery-resistant due to O(d³ log d) query complexity and O(d⁶) fitting time, making fabrication of signed logprobs infeasible for production closed-weight LMs without parameter access

Ellipse Signature

Novel technique introduced


The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint--namely, that language model outputs lie on the surface of a high-dimensional ellipse--functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.


Key Contributions

  • Shows that LM outputs lie on a high-dimensional ellipse (hyperellipsoid) forming a naturally-occurring, forgery-resistant, self-contained signature detectable in a single generation step
  • Demonstrates that ellipse extraction requires O(d³ log d) API queries and O(d⁶) fitting time, making forgery practically infeasible for production-scale closed-weight models
  • Proposes an output verification protocol analogous to cryptographic symmetric-key MACs, using the model ellipse as a shared secret to authenticate logprob outputs

🛡️ Threat Analysis

Output Integrity Attack

Proposes a model output verification and authentication system using ellipse signatures on LM logprobs — directly addresses output integrity and content provenance. Forgery resistance specifically defends against falsely attributing outputs to a model they didn't come from, framed as a MAC-like output authentication protocol.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
language model output verificationmodel output attributionlanguage model forensics