Latest papers

9 papers
defense arXiv Feb 5, 2026 · 8w ago

Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink

Guozhi Liu, Weiwei Lin, Tiansheng Huang et al. · South China University of Technology · Pengcheng Laboratory +1 more

Defends LLM safety alignment during fine-tuning by regularizing attention sink divergence to prevent harmful pattern learning

Transfer Learning Attack nlp
PDF Code
attack arXiv Feb 3, 2026 · 8w ago

Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks

Yi Yu, Qixin Zhang, Shuhan Ye et al. · Nanyang Technological University · Chinese University of Hong Kong +2 more

Gradient-based timing-only adversarial attack on event-driven SNNs retimes spikes to cause misclassification while preserving spike counts

Input Manipulation Attack vision
2 citations PDF Code
defense arXiv Jan 22, 2026 · 10w ago

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Song Xia, Meiwen Ding, Chenqi Kong et al. · Nanyang Technological University · Pengcheng Laboratory

Certified feature-space robustness framework defends multimodal LLMs against ℓ2-bounded adversarial perturbations via Gaussian smoothing

Input Manipulation Attack visionnlpmultimodal
PDF
defense arXiv Dec 16, 2025 · Dec 2025

FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos

Zhaolun Li, Jichang Li, Yinqi Cai et al. · Guilin University of Electronic Technology · Pengcheng Laboratory +3 more

Deepfake video detector that synthesizes forgery outliers via CLIP features to generalize across unseen manipulation types

Output Integrity Attack vision
3 citations PDF
attack arXiv Dec 3, 2025 · Dec 2025

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Haidong Kang, Wei Wu, Hanling Wang · Northeastern University · University of Electronic Science and Technology of China +1 more

Uses LLMs with PPO reinforcement learning to auto-discover adversarial attacks that outperform PGD/FGSM against few-shot class-incremental learning systems

Input Manipulation Attack visionnlp
PDF
defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative
1 citations PDF
defense arXiv Oct 29, 2025 · Oct 2025

DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis

Yinqi Cai, Jichang Li, Zhaolun Li et al. · Guilin University of Electronic Technology · Sun Yat-Sen University +2 more

Detects deepfake face videos across unseen manipulations via CLIP-ViT with local patch and global domain-augmentation modules

Output Integrity Attack visiongenerative
4 citations 1 influentialPDF Code
defense arXiv Oct 11, 2025 · Oct 2025

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Guozhi Liu, Qi Mu, Tiansheng Huang et al. · South China University of Technology · Ltd. +4 more

Curates safety-critical alignment data subsets to harden LLMs against harmful fine-tuning attacks while cutting training time by ~57%

Transfer Learning Attack Prompt Injection nlp
2 citations 1 influentialPDF Code
attack TPAMI Sep 23, 2025 · Sep 2025

SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment Models

Yujia Liu, Dingquan Li, Zhixuan Li et al. · Peking University · Pengcheng Laboratory +1 more

Proposes SEGA, the first transferable black-box adversarial attack against NR-IQA models using signed ensemble Gaussian gradient estimation.

Input Manipulation Attack vision
PDF