attack 2025

Reconstructing Trust Embeddings from Siamese Trust Scores: A Direct-Sum Approach with Fixed-Point Semantics

Faruk Alpay 1, Taylan Alpay 2, Bugra Kilictas 3

0 citations

α

Published on arXiv

2508.01479

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Published scalar trust scores from two independent ChatGPT agents contain sufficient information to reconstruct approximate latent device embeddings that preserve inter-device geometry, demonstrating a concrete privacy leakage risk in Siamese trust evaluation systems.

Direct-Sum Embedding Reconstruction

Novel technique introduced


We study the inverse problem of reconstructing high-dimensional trust embeddings from the one-dimensional Siamese trust scores that many distributed-security frameworks expose. Starting from two independent agents that publish time-stamped similarity scores for the same set of devices, we formalise the estimation task, derive an explicit direct-sum estimator that concatenates paired score series with four moment features, and prove that the resulting reconstruction map admits a unique fixed point under a contraction argument rooted in Banach theory. A suite of synthetic benchmarks (20 devices x 10 time steps) confirms that, even in the presence of Gaussian noise, the recovered embeddings preserve inter-device geometry as measured by Euclidean and cosine metrics; we complement these experiments with non-asymptotic error bounds that link reconstruction accuracy to score-sequence length. Beyond methodology, the paper demonstrates a practical privacy risk: publishing granular trust scores can leak latent behavioural information about both devices and evaluation models. We therefore discuss counter-measures -- score quantisation, calibrated noise, obfuscated embedding spaces -- and situate them within wider debates on transparency versus confidentiality in networked AI systems. All datasets, reproduction scripts and extended proofs accompany the submission so that results can be verified without proprietary code.


Key Contributions

  • Formalization of the trust embedding reconstruction problem: recovering high-dimensional embeddings from published 1D Siamese trust scores via a direct-sum estimator with four moment features
  • Proof of a unique fixed point for the reconstruction map using a Banach contraction argument, plus non-asymptotic error bounds linking reconstruction accuracy to score-sequence length
  • Synthetic benchmark demonstrating that reconstructed embeddings preserve inter-device geometry (Euclidean and cosine) under Gaussian noise, confirming a practical privacy risk in publishing granular trust scores

🛡️ Threat Analysis

Model Inversion Attack

The paper's primary contribution is showing an adversary can reconstruct high-dimensional latent embeddings (internal model representations) from published scalar trust scores — a textbook embedding/model inversion attack. The paper formalizes the reconstruction algorithm, proves uniqueness, demonstrates it on synthetic benchmarks, and discusses counter-measures (score quantisation, calibrated noise) specifically against this data reconstruction threat.


Details

Domains
graphnlp
Model Types
llmgnn
Threat Tags
black_boxinference_time
Datasets
synthetic benchmark (20 devices × 10 time steps)
Applications
distributed trust evaluationnetworked device securityllm-based trust frameworks