attack 2025

Reconstructing Trust Embeddings from Siamese Trust Scores: A Direct-Sum Approach with Fixed-Point Semantics

Faruk Alpay ¹, Taylan Alpay ², Bugra Kilictas ³

¹ Lightcap

² Turkish Aeronautical Association

³ Bahcesehir University

0 citations

Published on arXiv

2508.01479

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Published scalar trust scores from two independent ChatGPT agents contain sufficient information to reconstruct approximate latent device embeddings that preserve inter-device geometry, demonstrating a concrete privacy leakage risk in Siamese trust evaluation systems.

Direct-Sum Embedding Reconstruction

Novel technique introduced

We study the inverse problem of reconstructing high-dimensional trust embeddings from the one-dimensional Siamese trust scores that many distributed-security frameworks expose. Starting from two independent agents that publish time-stamped similarity scores for the same set of devices, we formalise the estimation task, derive an explicit direct-sum estimator that concatenates paired score series with four moment features, and prove that the resulting reconstruction map admits a unique fixed point under a contraction argument rooted in Banach theory. A suite of synthetic benchmarks (20 devices x 10 time steps) confirms that, even in the presence of Gaussian noise, the recovered embeddings preserve inter-device geometry as measured by Euclidean and cosine metrics; we complement these experiments with non-asymptotic error bounds that link reconstruction accuracy to score-sequence length. Beyond methodology, the paper demonstrates a practical privacy risk: publishing granular trust scores can leak latent behavioural information about both devices and evaluation models. We therefore discuss counter-measures -- score quantisation, calibrated noise, obfuscated embedding spaces -- and situate them within wider debates on transparency versus confidentiality in networked AI systems. All datasets, reproduction scripts and extended proofs accompany the submission so that results can be verified without proprietary code.

Key Contributions

Formalization of the trust embedding reconstruction problem: recovering high-dimensional embeddings from published 1D Siamese trust scores via a direct-sum estimator with four moment features
Proof of a unique fixed point for the reconstruction map using a Banach contraction argument, plus non-asymptotic error bounds linking reconstruction accuracy to score-sequence length
Synthetic benchmark demonstrating that reconstructed embeddings preserve inter-device geometry (Euclidean and cosine) under Gaussian noise, confirming a practical privacy risk in publishing granular trust scores

🛡️ Threat Analysis

Model Inversion Attack

The paper's primary contribution is showing an adversary can reconstruct high-dimensional latent embeddings (internal model representations) from published scalar trust scores — a textbook embedding/model inversion attack. The paper formalizes the reconstruction algorithm, proves uniqueness, demonstrates it on synthetic benchmarks, and discusses counter-measures (score quantisation, calibrated noise) specifically against this data reconstruction threat.

Details

Domains

graphnlp

Model Types

llmgnn

Threat Tags

black_boxinference_time

Datasets

synthetic benchmark (20 devices × 10 time steps)

Applications

distributed trust evaluationnetworked device securityllm-based trust frameworks

Read PDF arXiv

Reconstructing Trust Embeddings from Siamese Trust Scores: A Direct-Sum Approach with Fixed-Point Semantics

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning

Taipan: A Query-free Transfer-based Multiple Sensitive Attribute Inference Attack Solely from Publicly Released Graphs

How Private Are DNA Embeddings? Inverting Foundation Model Representations of Genomic Sequences

TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion

Embedding Inversion via Conditional Masked Diffusion Language Models

REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop

Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

Extracting alignment data in open models