ML Security Papers

Latest papers

245 papers

attack arXiv Apr 5, 2026 · 3d ago

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Aobo Chen, Chenxu Zhao, Chenglin Miao et al. · Iowa State University

Adversarial attack on LLM unlearning that forces incorrect answers while generating convincing but misleading reasoning traces

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

attack arXiv Apr 1, 2026 · 7d ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning

PDF

defense arXiv Mar 30, 2026 · 9d ago

FedFG: Privacy-Preserving and Robust Federated Learning via Flow-Matching Generation

Ruiyang Wang, Rong Pan, Zhengan Yao · Sun Yat-Sen University

Federated learning defense using flow-matching generators to prevent gradient inversion and detect poisoning attacks simultaneously

Data Poisoning Attack Model Inversion Attack federated-learningvision

PDF Code

defense arXiv Mar 27, 2026 · 12d ago

Towards Privacy-Preserving Federated Learning using Hybrid Homomorphic Encryption

Ivan Costa, Pedro Correia, Ivone Amorim et al. · Polytechnic of Porto

Cryptographic key protection mechanisms for federated learning that defend against malicious clients stealing private model updates

Model Inversion Attack federated-learning

PDF

attack arXiv Mar 25, 2026 · 14d ago

Uncovering Memorization in Timeseries Imputation models: LBRM Membership Inference and its link to attribute Leakage

Faiz Taleb, Ivan Gazeau, Maryline Laurent · EDF · Télécom SudParis +1 more

Membership and attribute inference attacks on time-series imputation models, achieving 0.90 AUROC via reference-model comparison

Membership Inference Attack Model Inversion Attack timeseries

PDF

survey arXiv Mar 25, 2026 · 14d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal

PDF

defense arXiv Mar 19, 2026 · 20d ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo et al. · Peking University · Zhongguancun Laboratory

Reveals label inference attacks in VFL succeed due to feature-label alignment, proposes zero-overhead cut layer defense

Model Inversion Attack federated-learning

PDF

defense arXiv Mar 19, 2026 · 20d ago

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

Duc Hao Pham, Van Duy Truong, Duy Khanh Dinh et al. · VNPT Group

Distributional unlearning framework using diverse prompts to more robustly erase visual concepts from diffusion models against recovery attacks

Model Inversion Attack visiongenerativemultimodal

PDF

attack arXiv Mar 18, 2026 · 21d ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang et al. · Griffith University · Swinburne University of Technology +2 more

Gradient inversion attack reconstructing training data from federated learning updates via sparse activation recovery without architectural changes

Model Inversion Attack visionfederated-learning

PDF

attack arXiv Mar 18, 2026 · 21d ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative

PDF

attack arXiv Mar 17, 2026 · 22d ago

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

Yibo Li, Qiongxiu Li · Politecnico di Milano · Aalborg University

Scalable gradient inversion attack recovering private training text from aggregated LLM gradients in federated learning settings

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning

PDF

benchmark arXiv Mar 12, 2026 · 27d ago

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

Patricia Guerra-Balboa, Annika Sauer, Héber H. Arcolezi et al. · Karlsruhe Institute of Technology · Inria Centre at the University Grenoble Alpes +1 more

Proposes reconstruction advantage metric unifying MIA, AIA, and DRA to tightly bound DP disclosure risk and improve auditing

Model Inversion Attack Membership Inference Attack tabular

PDF

benchmark arXiv Mar 9, 2026 · 4w ago

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Sebastian Ochs, Ivan Habernal · Trustworthy Human Language Technologies · Technical University of Darmstadt +2 more

Critiques PII reconstruction attack evaluations, showing data leakage and LLM memorization inflate reported attack success rates

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

benchmark arXiv Mar 9, 2026 · 4w ago

Quantifying Memorization and Privacy Risks in Genomic Language Models

Alexander Nemecek, Wenbiao Li, Xiaoqian Jiang et al. · Case Western Reserve University · UTHealth +1 more

Multi-vector framework quantifying memorization, canary extraction, and membership inference risks across genomic language model architectures

Model Inversion Attack Membership Inference Attack nlp

PDF

defense arXiv Mar 9, 2026 · 4w ago

Client-Cooperative Split Learning

Haiyu Deng, Yanna Jiang, Guangsheng Yu et al. · University of Technology Sydney · CSIRO Data61 +1 more

Defends split learning against activation inversion, label clustering, and model extraction via DP and chained watermarking

Model Inversion Attack Model Theft federated-learningvision

PDF

attack arXiv Mar 6, 2026 · 4w ago

How Private Are DNA Embeddings? Inverting Foundation Model Representations of Genomic Sequences

Sofiane Ouaari, Jules Kreuer, Nico Pfeifer · University of Tuebingen

Demonstrates embedding inversion attacks reconstructing private DNA sequences from DNABERT-2, Evo 2, and NTv2 EaaS embeddings with >90% similarity

Model Inversion Attack nlp

PDF Code

defense arXiv Mar 5, 2026 · 4w ago

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov · SingularityNET Foundation · Singularity Compute

Defends LLM prompt privacy on shared accelerators by obfuscating hidden states with per-batch invertible mixing inside a TEE

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

defense arXiv Mar 5, 2026 · 4w ago

Balancing Privacy-Quality-Efficiency in Federated Learning through Round-Based Interleaving of Protection Techniques

Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller · Chalmers University of Technology

Defends federated learning against gradient reconstruction attacks by interleaving DP, homomorphic encryption, and synthetic data rounds

Model Inversion Attack federated-learningvision

PDF

defense arXiv Mar 4, 2026 · 5w ago

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Kelly L Vomo-Donfack, Adryel Hoszu, Grégory Ginot et al. · Université Sorbonne Paris Nord · Instituto de Hortofruticultura Subtropical y Mediterránea La Mayora

Replaces FL gradient sharing with persistent homology descriptors to provably harden against data reconstruction and Byzantine poisoning attacks

Model Inversion Attack Data Poisoning Attack federated-learning

PDF Code

defense arXiv Mar 4, 2026 · 5w ago

Privacy-Preserving Collaborative Medical Image Segmentation Using Latent Transform Networks

Saheed Ademola Bello, Muhammad Shahid Jabbar, Muhammad Sohail Ibrahim et al. · King Fahd University of Petroleum & Minerals

Defends collaborative medical segmentation latent spaces against inversion and membership inference via keyed orthogonal transforms

Model Inversion Attack Membership Inference Attack visionfederated-learning

PDF

Loading more papers…

Latest papers

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

FedFG: Privacy-Preserving and Robust Federated Learning via Flow-Matching Generation

Towards Privacy-Preserving Federated Learning using Hybrid Homomorphic Encryption

Uncovering Memorization in Timeseries Imputation models: LBRM Membership Inference and its link to attribute Leakage

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Quantifying Memorization and Privacy Risks in Genomic Language Models

Client-Cooperative Split Learning

How Private Are DNA Embeddings? Inverting Foundation Model Representations of Genomic Sequences

Good-Enough LLM Obfuscation (GELO)

Balancing Privacy-Quality-Efficiency in Federated Learning through Round-Based Interleaving of Protection Techniques

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Privacy-Preserving Collaborative Medical Image Segmentation Using Latent Transform Networks

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue