Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.

Key Contributions

First work to demonstrate adversarial attacks via unlearning corruption on GNNs
Bi-level optimization framework for black-box unlearning attacks using gradient approximation and pseudo-labels
Demonstrates that small unlearning requests can cause significant accuracy degradation across multiple GNN architectures and unlearning algorithms

🛡️ Threat Analysis

Data Poisoning Attack

The attack vector is injecting carefully chosen nodes into the training graph (data poisoning). While the damage manifests after unlearning, the initial attack is training-time data injection designed to corrupt model behavior.

Model Skewing

The attack manipulates model behavior over time through a multi-stage process: injecting nodes during training, then requesting their deletion to trigger performance collapse. This is temporal/incremental manipulation that exploits the unlearning feedback loop — the model is normal during training but degrades only after unlearning is applied. This fits ML08's core threat of gradual manipulation through temporal processes.

Details

Domains

graph

Model Types

gnn

Threat Tags

training_timeblack_boxtargeted

Datasets

CoraCiteSeerPubMedFacebookChameleon

Applications

2026 0 cit.

Model PoisoningData Poisoning Attack

47%

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

LoReTTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Multi-Targeted Graph Backdoor Attack

Controllable and Stealthy Shilling Attacks via Dispersive Latent Diffusion

Adversarial Attacks on Locally Private Graph Neural Networks

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs