ML Security Papers

Latest papers

4 papers

benchmark arXiv Jan 27, 2026 · 9w ago

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Yao Xiao, Weiyan Chen, Jiahao Chen et al. · Sun Yat-Sen University · Xi’an Jiaotong University +3 more

Introduces X-AIGD benchmark with pixel-level perceptual artifact annotations to enable interpretable AI-generated image detection evaluation

Output Integrity Attack vision

PDF Code

attack arXiv Dec 26, 2025 · Dec 2025

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Mengqi He, Xinyu Tian, Xin Shen et al. · Australian National University · The University of Queensland +1 more

Targets high-entropy VLM decoding positions with adversarial visual perturbations, converting 35-49% of benign outputs to harmful content at 93-95% attack success rate

Input Manipulation Attack Prompt Injection visionnlpmultimodal

PDF

defense arXiv Sep 21, 2025 · Sep 2025

MARS: A Malignity-Aware Backdoor Defense in Federated Learning

Wei Wan, Yuxuan Ning, Zhicong Huang et al. · City University of Macau · Australian National University +4 more

Defends federated learning against backdoor attacks using neuron-level backdoor energy and Wasserstein clustering to detect malicious model updates

Model Poisoning federated-learningvision

5 citations PDF

attack arXiv Aug 3, 2025 · Aug 2025

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

Yujia Zheng, Tianhao Li, Haotian Huang et al. · Duke University · North China University of Technology +7 more

Attacks LLMs via component-wise text perturbations, revealing heterogeneous adversarial robustness across dissected prompt structures

Prompt Injection nlp

PDF Code

Latest papers

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

MARS: A Malignity-Aware Backdoor Defense in Federated Learning

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue