ML Security Papers

Latest papers

6 papers

defense arXiv Mar 6, 2026 · 4w ago

Word-Anchored Temporal Forgery Localization

Tianyi Wang, Xi Shao, Harry Cheng et al. · National University of Singapore · Nanjing University of Posts and Telecommunications +1 more

Detects audio-visual deepfake segments via word-token binary classification, outperforming regression-based TFL baselines

Output Integrity Attack audiovisionmultimodal

PDF

attack arXiv Feb 11, 2026 · 7w ago

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang et al. · Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence +5 more

Backdoor attack on code models using sharpness-aware training and Gumbel-Softmax triggers for cross-dataset transferability and stealthiness

Model Poisoning nlp

PDF

Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.

transformer Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence · Jiangsu Provincial Key Laboratory of Internet of Things Intelligent Perception and Computing +4 more

PDF arXiv DOI

attack arXiv Jan 29, 2026 · 9w ago

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

Puwei Lian, Yujun Cai, Songze Li et al. · Southeast University · The University of Queensland +1 more

Exploits residual semantics in diffusion model noise schedules to perform black-box membership inference without auxiliary data

Membership Inference Attack visiongenerative

PDF

attack arXiv Jan 28, 2026 · 9w ago

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Xingwei Lin, Wenhao Lin, Sicong Cao et al. · Zhejiang University · Nanjing University of Posts and Telecommunications +2 more

Exploits intent-context coupling in multi-turn jailbreaks to bypass LLM safety with 97.1% attack success rate

Prompt Injection nlp

PDF Code

attack arXiv Nov 10, 2025 · Nov 2025

Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment

Peng Zhang, Peijie Sun · Nanjing University of Posts and Telecommunications

White-box activation attack decomposes LLM safety alignment into two directions and neutralizes both, achieving 97.88% jailbreak success on Llama-2

Prompt Injection nlp

1 citations PDF

benchmark arXiv Oct 26, 2025 · Oct 2025

DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection

Kangran Zhao, Yupeng Chen, Xiaoyu Zhang et al. · The Chinese University of Hong Kong · State University of New York +1 more

Proposes the largest multimodal deepfake benchmark (1.1M forged samples, 21 pipelines) and unified evaluation framework for audiovisual deepfake detection

Output Integrity Attack visionaudiomultimodal

1 citations PDF

Latest papers

Word-Anchored Temporal Forgery Localization

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment

DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue