ML Security Papers

Latest papers

10 papers

defense arXiv Apr 14, 2026 · 5w ago

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Wenyun Li, Zheng Zhang, Dongmei Jiang et al. · Harbin Institute of Technology · Pengcheng Laboratory +1 more

Parameter-efficient adversarial training for Vision Transformers achieving near-full robustness while fine-tuning only 6% of parameters

Input Manipulation Attack vision

PDF

defense arXiv Feb 5, 2026 · Feb 2026

Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink

Guozhi Liu, Weiwei Lin, Tiansheng Huang et al. · South China University of Technology · Pengcheng Laboratory +1 more

Defends LLM safety alignment during fine-tuning by regularizing attention sink divergence to prevent harmful pattern learning

Transfer Learning Attack nlp

PDF Code

attack arXiv Feb 3, 2026 · Feb 2026

Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks

Yi Yu, Qixin Zhang, Shuhan Ye et al. · Nanyang Technological University · Chinese University of Hong Kong +2 more

Gradient-based timing-only adversarial attack on event-driven SNNs retimes spikes to cause misclassification while preserving spike counts

Input Manipulation Attack vision

2 citations PDF Code

Spiking neural networks (SNNs) compute with discrete spikes and exploit temporal structure, yet most adversarial attacks change intensities or event counts instead of timing. We study a timing-only adversary that retimes existing spikes while preserving spike counts and amplitudes in event-driven SNNs, thus remaining rate-preserving. We formalize a capacity-1 spike-retiming threat model with a unified trio of budgets: per-spike jitter $\mathcal{B}_{\infty}$, total delay $\mathcal{B}_{1}$, and tamper count $\mathcal{B}_{0}$. Feasible adversarial examples must satisfy timeline consistency and non-overlap, which makes the search space discrete and constrained. To optimize such retimings at scale, we use projected-in-the-loop (PIL) optimization: shift-probability logits yield a differentiable soft retiming for backpropagation, and a strict projection in the forward pass produces a feasible discrete schedule that satisfies capacity-1, non-overlap, and the chosen budget at every step. The objective maximizes task loss on the projected input and adds a capacity regularizer together with budget-aware penalties, which stabilizes gradients and aligns optimization with evaluation. Across event-driven benchmarks (CIFAR10-DVS, DVS-Gesture, N-MNIST) and diverse SNN architectures, we evaluate under binary and integer event grids and a range of retiming budgets, and also test models trained with timing-aware adversarial training designed to counter timing-only attacks. For example, on DVS-Gesture the attack attains high success (over $90\%$) while touching fewer than $2\%$ of spikes under $\mathcal{B}_{0}$. Taken together, our results show that spike retiming is a practical and stealthy attack surface that current defenses struggle to counter, providing a clear reference for temporal robustness in event-driven SNNs. Code is available at https://github.com/yuyi-sd/Spike-Retiming-Attacks.

cnn Nanyang Technological University · Chinese University of Hong Kong · Southeast University +1 more

PDF arXiv DOI Code

defense arXiv Jan 22, 2026 · Jan 2026

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Song Xia, Meiwen Ding, Chenqi Kong et al. · Nanyang Technological University · Pengcheng Laboratory

Certified feature-space robustness framework defends multimodal LLMs against ℓ2-bounded adversarial perturbations via Gaussian smoothing

Input Manipulation Attack visionnlpmultimodal

PDF

defense arXiv Dec 16, 2025 · Dec 2025

FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos

Zhaolun Li, Jichang Li, Yinqi Cai et al. · Guilin University of Electronic Technology · Pengcheng Laboratory +3 more

Deepfake video detector that synthesizes forgery outliers via CLIP features to generalize across unseen manipulation types

Output Integrity Attack vision

3 citations PDF

attack arXiv Dec 3, 2025 · Dec 2025

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Haidong Kang, Wei Wu, Hanling Wang · Northeastern University · University of Electronic Science and Technology of China +1 more

Uses LLMs with PPO reinforcement learning to auto-discover adversarial attacks that outperform PGD/FGSM against few-shot class-incremental learning systems

Input Manipulation Attack visionnlp

PDF

defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative

1 citations PDF

defense arXiv Oct 29, 2025 · Oct 2025

DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis

Yinqi Cai, Jichang Li, Zhaolun Li et al. · Guilin University of Electronic Technology · Sun Yat-Sen University +2 more

Detects deepfake face videos across unseen manipulations via CLIP-ViT with local patch and global domain-augmentation modules

Output Integrity Attack visiongenerative

4 citations 1 influentialPDF Code

defense arXiv Oct 11, 2025 · Oct 2025

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Guozhi Liu, Qi Mu, Tiansheng Huang et al. · South China University of Technology · Ltd. +4 more

Curates safety-critical alignment data subsets to harden LLMs against harmful fine-tuning attacks while cutting training time by ~57%

Transfer Learning Attack Prompt Injection nlp

2 citations 1 influentialPDF Code

attack TPAMI Sep 23, 2025 · Sep 2025

SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment Models

Yujia Liu, Dingquan Li, Zhixuan Li et al. · Peking University · Pengcheng Laboratory +1 more

Proposes SEGA, the first transferable black-box adversarial attack against NR-IQA models using signed ensemble Gaussian gradient estimation.

Input Manipulation Attack vision

PDF

Latest papers

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink

Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue