ML Security Papers

Latest papers

5 papers

benchmark arXiv Feb 2, 2026 · 9w ago

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Daniil Orel, Dilshod Azizov, Indraneil Paul et al. · Mohamed bin Zayed University of Artificial Intelligence · TU Darmstadt +1 more

Large-scale benchmark revealing AI-generated code detectors fail severely under distribution shift and adversarial conditions

Output Integrity Attack nlp

PDF Code

attack arXiv Jan 19, 2026 · 11w ago

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta et al. · INSAIT · Sofia University +3 more

Jailbreaks MLLMs via adversarial prompting to auto-generate misleading charts, reducing human and MLLM QA accuracy by ~20 points

Prompt Injection multimodalvisionnlp

PDF Code

defense arXiv Nov 26, 2025 · Nov 2025

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

Xiang Gu, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · The Hong Kong University of Science and Technology (Guangzhou) +3 more

Defends 3D point cloud models against adversarial attacks via multimodal teacher-student prompt distillation with zero inference overhead

Input Manipulation Attack visionmultimodal

PDF Code

attack arXiv Oct 28, 2025 · Oct 2025

SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning

Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader et al. · ETH Zürich · INSAIT +1 more

Scales gradient inversion attacks in federated learning to 10x larger batch sizes using sparse dictionary learning

Model Inversion Attack federated-learning

PDF

defense arXiv Sep 16, 2025 · Sep 2025

CIARD: Cyclic Iterative Adversarial Robustness Distillation

Liming Lu, Shuchao Pang, Xu Zheng et al. · Nanjing University of Science and Technology · HKUST(GZ) +4 more

Defends lightweight student models against adversarial attacks via cyclic multi-teacher distillation with contrastive alignment and continuous adversarial retraining

Input Manipulation Attack vision

PDF Code

Latest papers

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning

CIARD: Cyclic Iterative Adversarial Robustness Distillation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue