ML Security Papers

Latest papers

12 papers

attack arXiv Mar 17, 2026 · 20d ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision

PDF

defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative

PDF

survey arXiv Feb 24, 2026 · 5w ago

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng et al. · University of Technology Sydney · CSIRO

Surveys LLM agentic skill security covering marketplace supply-chain attacks, prompt injection via skill payloads, and trust-tiered execution

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlpreinforcement-learning

PDF

benchmark arXiv Feb 18, 2026 · 6w ago

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin, Shuai Wang, Bevan Koopman et al. · The University of Queensland · CSIRO

Benchmarks indirect prompt injection attacks on LLM rankers, revealing encoder-decoder architectures are far more resilient than decoder-only models

Prompt Injection nlp

PDF Code

defense arXiv Feb 4, 2026 · 8w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative

PDF

defense arXiv Nov 21, 2025 · Nov 2025

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

Yuqi Li, Junhao Dong, Chuanguang Yang et al. · Nanyang Technological University · Institute of Computing Technology +4 more

Defends VLMs against adversarial examples via dual multi-teacher distillation, gaining +4.32% robust accuracy with 2.3x training speedup

Input Manipulation Attack visionmultimodal

2 citations PDF Code

attack arXiv Nov 19, 2025 · Nov 2025

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu et al. · City University of Macau · University of Vienna +3 more

Novel LLM jailbreak using conceptual morphology triggers to shift ideological orientation in outputs without triggering safety filters

Prompt Injection nlp

PDF

defense arXiv Nov 16, 2025 · Nov 2025

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

Jialiang Shen, Jiyang Zheng, Yunqi Xue et al. · The University of Sydney · Shanghai Jiao Tong University +3 more

Proposes blur-robust AI-generated image detector via DINO-based teacher-student knowledge distillation for real-world motion degradation

Output Integrity Attack vision

1 citations PDF Code

attack arXiv Oct 2, 2025 · Oct 2025

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

Junjie Su, Weifei Jin, Yuxin Cao et al. · Beijing University of Posts and Telecommunications · National University of Singapore +2 more

First targeted adversarial attack framework for polyphonic SED, inserting or deleting sound events with precise region control via preservation loss

Input Manipulation Attack audio

PDF

survey arXiv Sep 25, 2025 · Sep 2025

Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models

Kang Wei, Xin Yuan, Fushuo Huo et al. · Southeast University · CSIRO +3 more

Comprehensive survey of security threats and countermeasures for diffusion models spanning robustness, privacy, backdoors, and content integrity

Input Manipulation Attack Output Integrity Attack Model Poisoning visiongenerativemultimodal

1 citations PDF

defense arXiv Sep 19, 2025 · Sep 2025

Backdoor Mitigation via Invertible Pruning Masks

Kealan Dunnett, Reza Arablouei, Dimity Miller et al. · Queensland University of Technology · CSIRO

Pruning-based backdoor defense using invertible masks and bi-level optimization to surgically remove backdoor behavior while preserving clean accuracy

Model Poisoning vision

PDF

tool arXiv Aug 11, 2025 · Aug 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh et al. · CSIRO · Sungkyunkwan University +1 more

Builds an explainable deepfake detection pipeline combining Grad-CAM, visual captioning, and LLM-generated narratives for non-expert users

Output Integrity Attack visionnlpmultimodal

PDF

Latest papers

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

The Vulnerability of LLM Rankers to Prompt Injection Attacks

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models

Backdoor Mitigation via Invertible Pruning Masks

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue