ML Security Papers

Latest papers

10 papers

attack arXiv Mar 13, 2026 · 24d ago

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more

Dual-stream speaker re-identification attack on anonymized voice using SSL and spectral features with staged transfer learning

Input Manipulation Attack audio

PDF

attack arXiv Jan 26, 2026 · 10w ago

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho et al. · Singapore Institute of Technology · Aristotle University of Thessaloniki +3 more

Agentic VLM/LLM system orchestrates CW, JSMA, and STA attacks to evade deepfake detectors with improved black-box transfer

Input Manipulation Attack visionmultimodalnlp

PDF

benchmark arXiv Nov 10, 2025 · Nov 2025

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang, Mingzi Zhang, Xuanyu Yin et al. · Zhejiang University of Technology · Hong Kong University of Science and Technology +3 more

Benchmark evaluating teacher-persona jailbreaks on LLMs, revealing a scaling paradox where mid-sized models are most vulnerable

Prompt Injection nlp

PDF Code

defense ICDMW Oct 29, 2025 · Oct 2025

SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

Guangzhi Su, Shuchang Huang, Yutong Ke et al. · Duke Kunshan University

Defends MLLMs against adversarial visual and audio inputs using randomized noise injection and clustering-based output aggregation

Input Manipulation Attack Prompt Injection visionaudiomultimodalnlp

PDF

attack arXiv Oct 14, 2025 · Oct 2025

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava et al. · Columbia University · Singapore Institute of Technology +1 more

Dual-stream PGD attack crafts transferable, imperceptible adversarial examples that evade black-box deepfake detectors by 27% over SOTA

Input Manipulation Attack vision

2 citations PDF

defense arXiv Sep 30, 2025 · Sep 2025

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

Qinjian Zhao, Jiaqi Wang, Zhiqiang Gao et al. · Wenzhou-Kean University · University of Bremen +2 more

Three-stage LLM jailbreak defense using intention inference, self-introspection, and self-revision to counter optimization-based and prompt-based attacks

Input Manipulation Attack Prompt Injection nlp

PDF

benchmark arXiv Sep 25, 2025 · Sep 2025

The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures

Zhenshan Zhang, Xueping Zhang, Yechen Wang et al. · Duke Kunshan University · Inc.

Benchmarks audio watermarking's degradation of deepfake detectors and proposes KPWL adaptation framework to restore robustness

Output Integrity Attack audio

PDF Code

defense arXiv Sep 1, 2025 · Sep 2025

Unraveling LLM Jailbreaks Through Safety Knowledge Neurons

Chongwen Zhao, Yutong Ke, Kaizhu Huang · Duke Kunshan University

Identifies safety-critical neurons in LLMs and proposes SafeTuning to reinforce them against jailbreak attacks

Prompt Injection nlp

PDF

attack arXiv Aug 26, 2025 · Aug 2025

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more

Attacks voice anonymization systems by augmenting ASV training data via word-level segment rearrangement to recover speaker identity

Output Integrity Attack audio

PDF Code

defense arXiv Aug 4, 2025 · Aug 2025

Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling

Xuanjun Chen, Shih-Peng Cheng, Jiawei Du et al. · National Taiwan University · Johns Hopkins University +1 more

Novel hierarchical boundary modeling network that temporally localizes manipulated segments in audio-visual deepfake content

Output Integrity Attack multimodalaudiovision

PDF

Latest papers

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures

Unraveling LLM Jailbreaks Through Safety Knowledge Neurons

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue