ML Security Papers

Latest papers

8 papers

benchmark arXiv Mar 12, 2026 · 25d ago

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

Patricia Guerra-Balboa, Annika Sauer, Héber H. Arcolezi et al. · Karlsruhe Institute of Technology · Inria Centre at the University Grenoble Alpes +1 more

Proposes reconstruction advantage metric unifying MIA, AIA, and DRA to tightly bound DP disclosure risk and improve auditing

Model Inversion Attack Membership Inference Attack tabular

PDF

defense arXiv Feb 9, 2026 · 8w ago

RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei et al. · Sharif University of Technology · Karlsruhe Institute of Technology

Defends federated learning on IoT against poisoning attacks via KL-divergence client validation and knowledge distillation aggregation

Data Poisoning Attack federated-learningvision

PDF

defense arXiv Feb 6, 2026 · 8w ago

PurSAMERE: Reliable Adversarial Purification via Sharpness-Aware Minimization of Expected Reconstruction Error

Vinh Hoang, Sebastian Krumscheid, Holger Rauhut et al. · RWTH-Aachen University · Forschungszentrum Jülich +3 more

Deterministic adversarial purification via sharpness-aware minimization that resists full-knowledge white-box attacks without gradient obfuscation

Input Manipulation Attack vision

PDF

attack arXiv Jan 8, 2026 · 12w ago

Higher-Order Adversarial Patches for Real-Time Object Detectors

Jens Bayer, Stefan Becker, David Münch et al. · Fraunhofer IOSB · Karlsruhe Institute of Technology

Iterative higher-order adversarial patches against YOLOv10 show stronger cross-model transfer than lower-order patches, defeating adversarial training alone

Input Manipulation Attack vision

PDF Code

survey arXiv Dec 10, 2025 · Dec 2025

Chasing Shadows: Pitfalls in LLM Security Research

Jonathan Evertz, Niklas Risse, Nicolai Neuer et al. · CISPA Helmholtz Center for Information Security · Max Planck Institute for Security and Privacy +4 more

Surveys nine methodological pitfalls in LLM security research found in all 72 surveyed papers, with case studies showing how each misleads results

Data Poisoning Attack Prompt Injection nlp

2 citations PDF

defense arXiv Oct 21, 2025 · Oct 2025

S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Giorgio Piras, Qi Zhao, Fabio Brau et al. · University of Cagliari · Karlsruhe Institute of Technology

Plug-in sharpness minimization for adversarial pruning that stabilizes mask selection and improves pruned model robustness against adversarial attacks

Input Manipulation Attack vision

PDF

defense arXiv Sep 5, 2025 · Sep 2025

Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit et al. · Karlsruhe Institute of Technology · FZI Research Center for Information Technology

Sparse MoE layers in CNNs boost adversarial robustness under PGD/AutoPGD; routing collapse creates unexpectedly robust expert subpaths

Input Manipulation Attack vision

PDF Code

tool IEEE Transactions on Software ... Jan 3, 2025 · Jan 2025

How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models

Simone Corbo, Luca Bancale, Valeria De Gennaro et al. · Politecnico di Milano · Karlsruhe Institute of Technology

Evolutionary search-based tool that auto-generates fluent prompts to elicit toxic outputs from aligned LLMs, outperforming jailbreak baselines

Prompt Injection nlp

PDF

Latest papers

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

PurSAMERE: Reliable Adversarial Purification via Sharpness-Aware Minimization of Expected Reconstruction Error

Higher-Order Adversarial Patches for Real-Time Object Detectors

Chasing Shadows: Pitfalls in LLM Security Research

S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue