survey 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang ¹, Siyu Luan ²

¹ University of Central Florida

² University of Copenhagen

0 citations

Published on arXiv

2603.24857

Input Manipulation Attack

OWASP ML Top 10 — ML01

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Model Theft

OWASP ML Top 10 — ML05

Output Integrity Attack

OWASP ML Top 10 — ML09

Model Poisoning

OWASP ML Top 10 — ML10

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Establishes first unified taxonomy organizing ML security threats into four principled categories revealing interdependencies across the data-model attack surface

As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.

Key Contributions

Proposes unified closed-loop threat taxonomy organizing ML security into four directional axes: D→D, D→M, M→D, M→M
Exposes bidirectional dependencies between data and model security, showing how vulnerabilities propagate across the ML pipeline
Provides comprehensive framework for analyzing foundation model security with emphasis on cross-modal and transferable defense strategies

🛡️ Threat Analysis

Input Manipulation Attack

Survey covers adversarial examples and jailbreak attacks as part of the data-to-model threat category.

Data Poisoning Attack

Survey includes poisoning attacks as a core component of data-to-model threats.

Model Inversion Attack

Survey covers model inversion and training data extraction attacks in the model-to-data category.

Membership Inference Attack

Survey includes membership inference attacks as part of model-to-data privacy threats.

Model Theft

Survey covers model extraction attacks in the model-to-model threat category.

Output Integrity Attack

Survey includes data decryption attacks and watermark removal attacks in the data-to-data category.

Model Poisoning

Survey covers harmful fine-tuning attacks as part of data-to-model threats.

Details

Domains

visionnlpmultimodal

Model Types

llmtransformercnndiffusionmultimodal

Threat Tags

training_timeinference_timewhite_boxblack_box

Read PDF arXiv

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Survey of Secure Semantic Communications

Safety, Security, and Cognitive Risks in World Models

SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations

AI-Driven Cybersecurity Threats: A Survey of Emerging Risks and Defensive Strategies

Security Analysis of ChatGPT: Threats and Privacy Risks

A Survey: Towards Privacy and Security in Mobile Large Language Models

Oops!... They Stole it Again: Attacks on Split Learning

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey