Reconstructing Trust Embeddings from Siamese Trust Scores: A Direct-Sum Approach with Fixed-Point Semantics
Faruk Alpay 1, Taylan Alpay 2, Bugra Kilictas 3
Published on arXiv
2508.01479
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Published scalar trust scores from two independent ChatGPT agents contain sufficient information to reconstruct approximate latent device embeddings that preserve inter-device geometry, demonstrating a concrete privacy leakage risk in Siamese trust evaluation systems.
Direct-Sum Embedding Reconstruction
Novel technique introduced
We study the inverse problem of reconstructing high-dimensional trust embeddings from the one-dimensional Siamese trust scores that many distributed-security frameworks expose. Starting from two independent agents that publish time-stamped similarity scores for the same set of devices, we formalise the estimation task, derive an explicit direct-sum estimator that concatenates paired score series with four moment features, and prove that the resulting reconstruction map admits a unique fixed point under a contraction argument rooted in Banach theory. A suite of synthetic benchmarks (20 devices x 10 time steps) confirms that, even in the presence of Gaussian noise, the recovered embeddings preserve inter-device geometry as measured by Euclidean and cosine metrics; we complement these experiments with non-asymptotic error bounds that link reconstruction accuracy to score-sequence length. Beyond methodology, the paper demonstrates a practical privacy risk: publishing granular trust scores can leak latent behavioural information about both devices and evaluation models. We therefore discuss counter-measures -- score quantisation, calibrated noise, obfuscated embedding spaces -- and situate them within wider debates on transparency versus confidentiality in networked AI systems. All datasets, reproduction scripts and extended proofs accompany the submission so that results can be verified without proprietary code.
Key Contributions
- Formalization of the trust embedding reconstruction problem: recovering high-dimensional embeddings from published 1D Siamese trust scores via a direct-sum estimator with four moment features
- Proof of a unique fixed point for the reconstruction map using a Banach contraction argument, plus non-asymptotic error bounds linking reconstruction accuracy to score-sequence length
- Synthetic benchmark demonstrating that reconstructed embeddings preserve inter-device geometry (Euclidean and cosine) under Gaussian noise, confirming a practical privacy risk in publishing granular trust scores
🛡️ Threat Analysis
The paper's primary contribution is showing an adversary can reconstruct high-dimensional latent embeddings (internal model representations) from published scalar trust scores — a textbook embedding/model inversion attack. The paper formalizes the reconstruction algorithm, proves uniqueness, demonstrates it on synthetic benchmarks, and discusses counter-measures (score quantisation, calibrated noise) specifically against this data reconstruction threat.