attack 2025

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

Haoyan Xu ¹, Ruizhi Qian ¹, Jiate Li ¹, Yushun Dong ², Minghao Lin ¹, Hanson Yan ¹, Zhengtao Yao ¹, Qinghua Liu ³, Junhao Dong ⁴, Ruopeng Huang ¹, Yue Zhao ¹, Mengyuan Li ¹

¹ University of Southern California

² Florida State University

³ The Ohio State University

⁴ Nanyang Technological University

0 citations · 45 references · arXiv

Published on arXiv

2511.11912

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Surrogate encoder approximates the victim GFM using only 0.07% of original training time with an average classification accuracy gap of 0.0015 across seven datasets.

Supervised Embedding Regression

Novel technique introduced

Graph machine learning has advanced rapidly in tasks such as link prediction, anomaly detection, and node classification. As models scale up, pretrained graph models have become valuable intellectual assets because they encode extensive computation and domain expertise. Building on these advances, Graph Foundation Models (GFMs) mark a major step forward by jointly pretraining graph and text encoders on massive and diverse data. This unifies structural and semantic understanding, enables zero-shot inference, and supports applications such as fraud detection and biomedical analysis. However, the high pretraining cost and broad cross-domain knowledge in GFMs also make them attractive targets for model extraction attacks (MEAs). Prior work has focused only on small graph neural networks trained on a single graph, leaving the security implications for large-scale and multimodal GFMs largely unexplored. This paper presents the first systematic study of MEAs against GFMs. We formalize a black-box threat model and define six practical attack scenarios covering domain-level and graph-specific extraction goals, architectural mismatch, limited query budgets, partial node access, and training data discrepancies. To instantiate these attacks, we introduce a lightweight extraction method that trains an attacker encoder using supervised regression of graph embeddings. Even without contrastive pretraining data, this method learns an encoder that stays aligned with the victim text encoder and preserves its zero-shot inference ability on unseen graphs. Experiments on seven datasets show that the attacker can approximate the victim model using only a tiny fraction of its original training cost, with almost no loss in accuracy. These findings reveal that GFMs greatly expand the MEA surface and highlight the need for deployment-aware security defenses in large-scale graph learning systems.

Key Contributions

First systematic taxonomy of six model extraction attack scenarios against Graph Foundation Models, covering architectural mismatch, limited query budgets, partial node access, and training data discrepancies
Lightweight supervised embedding regression framework that trains a surrogate graph encoder aligned to the victim's text encoder without access to contrastive pretraining data
Empirical and theoretical demonstration that GFMs can be cloned at 0.07% of original training cost with an average accuracy gap of only 0.0015 across seven datasets

🛡️ Threat Analysis

Model Theft

Core contribution is a model extraction attack framework that clones GFMs by querying their black-box API and regressing on graph embeddings to reconstruct the victim model's zero-shot inference capability — direct model theft of intellectual property.

Details

Domains

graphmultimodal

Model Types

gnntransformermultimodal

Threat Tags

black_boxinference_time

Datasets

7 undisclosed graph datasets

Applications

fraud detectionbiomedical analysisnode classificationlink predictionanomaly detection

Read PDF arXiv DOI

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

On Stealing Graph Neural Network Models

Robust GNN Watermarking via Implicit Perception of Topological Invariants

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

CryptGNN: Enabling Secure Inference for Graph Neural Networks

Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems

LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking