Latest papers

17 papers
defense arXiv Apr 2, 2026 · 4d ago

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin et al. · National Huaqiao University · Shenzhen University +2 more

Diffusion-guided adversarial perturbation defense protecting facial images from deepfake manipulation in both white-box and black-box settings

Input Manipulation Attack visiongenerative
PDF
defense arXiv Mar 3, 2026 · 4w ago

SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety

Zixuan Xu, Tiancheng He, Huahui Yi et al. · Huazhong University of Science and Technology · Beijing University of Posts and Telecommunications +2 more

Structured virtual tool-calling framework trains VLMs to reason explicitly about safety, blocking multimodal jailbreaks while reducing over-refusal

Prompt Injection multimodalvisionnlp
PDF Code
defense arXiv Feb 10, 2026 · 7w ago

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

Zhuxin Lei, Ziyuan Yang, Yi Zhang · Sichuan University · Tianfu Jiangxi Laboratory

Dual-branch defense for SSL encoders that resists adversarial examples across downstream tasks without sacrificing benign performance

Input Manipulation Attack vision
PDF Code
defense arXiv Feb 4, 2026 · 8w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative
PDF
attack arXiv Jan 20, 2026 · 10w ago

LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Mengyu Sun, Ziyuan Yang, Andrew Beng Jin Teoh et al. · Sichuan University · The Hong Kong Polytechnic University +1 more

Attacks concept erasure defenses in diffusion models by reconstructing latent space to reawaken multiple suppressed concepts simultaneously

Input Manipulation Attack visiongenerative
PDF Code
defense arXiv Jan 3, 2026 · Jan 2026

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

Jiajie Zhu, Xia Du, Xiaoyuan Liu et al. · Xiamen University of Technology · Sichuan University +2 more

Reversible adversarial audio perturbations fool ASR systems into wrong transcriptions while authorized parties recover the original audio losslessly

Input Manipulation Attack audio
PDF
defense arXiv Dec 7, 2025 · Dec 2025

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

Zhibo Liang, Tianze Hu, Zaiye Chen et al. · Sichuan University

Defends LLM agents against indirect prompt injection via pre-generated intent graphs and tiered action deviation detection across the full task lifecycle

Prompt Injection Excessive Agency nlp
PDF
benchmark arXiv Dec 5, 2025 · Dec 2025

TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations

Xiuyuan Chen, Jian Zhao, Yuxiang He et al. · Institute of Artificial Intelligence (TeleAI) of China Telecom · Shanghai Jiao Tong University +6 more

Benchmarks LLM jailbreak robustness across 19 attacks, 29 defenses, and 19 evaluators on 14 models in a unified reproducible framework

Prompt Injection nlp
2 citations PDF Code
benchmark arXiv Nov 28, 2025 · Nov 2025

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Rui Zhang, Hongxia Wang, Hangqing Liu et al. · Sichuan University

Presents 300K-image benchmark and frequency-prompted VFM framework for localizing diffusion-based image edits at pixel level

Output Integrity Attack visiongenerative
PDF Code
attack arXiv Nov 18, 2025 · Nov 2025

Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT

Le Yu, Zhengyue Zhao, Yawen Zheng et al. · Sichuan University · University of Wisconsin–Madison +2 more

Breaks RVLM safety alignment via QLoRA fine-tuning on self-generated harmful CoT traces with 499 samples in under 3 hours

Transfer Learning Attack Prompt Injection multimodalnlp
PDF
attack arXiv Nov 11, 2025 · Nov 2025

A Small Leak Sinks All: Exploring the Transferable Vulnerability of Source Code Models

Weiye Li, Wenyi Tang · Sichuan University

Proposes victim-agnostic RL-based adversarial attack on source code models transferring 64% success rate to LLM4Code

Input Manipulation Attack nlp
PDF
attack Pattern Recognition Nov 3, 2025 · Nov 2025

Beyond Deceptive Flatness: Dual-Order Solution for Strengthening Adversarial Transferability

Zhixuan Zhang, Pingyu Wang, Xingjian Zheng et al. · Sichuan University · Frost Drill Intellectual Software Pte. Ltd +1 more

Black-box transferable adversarial attack using dual-order flatness to escape deceptive loss regions and boost cross-model transferability

Input Manipulation Attack vision
PDF
attack arXiv Oct 20, 2025 · Oct 2025

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Xinkai Wang, Beibei Li, Zerui Shao et al. · Sichuan University · Tianjin University +1 more

Black-box RL-based jailbreak framework exploiting multimodal safety asymmetry to achieve 95%+ attack success on GPT-4o and Gemini

Prompt Injection nlpmultimodal
1 citations PDF
defense arXiv Oct 2, 2025 · Oct 2025

Towards Imperceptible Adversarial Defense: A Gradient-Driven Shield against Facial Manipulations

Yue Li, Linying Xue, Dongdong Lin et al. · National Huaqiao University · University of Florence +1 more

Embeds imperceptible adversarial perturbations in facial images to disrupt deepfake generation using gradient-projection conflict resolution

Output Integrity Attack visiongenerative
PDF
defense arXiv Sep 26, 2025 · Sep 2025

Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness

Chaoyang Luo, Yan Zou, Nanjing Huang · Sichuan University · Yibin University

Defends Neural ODEs against adversarial attacks via Lyapunov stability framework that adaptively controls regions of attraction

Input Manipulation Attack vision
PDF
attack arXiv Aug 14, 2025 · Aug 2025

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

Huizhen Shu, Xuying Li, Qirui Wang et al. · hydrox.ai · University of Washington +1 more

Jailbreaks LLMs by perturbing sparse autoencoder features in hidden layers to generate adversarial texts that evade safety defenses

Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Aug 6, 2025 · Aug 2025

BadTime: An Effective Backdoor Attack on Multivariate Long-Term Time Series Forecasting

Kunlan Xiang, Haomiao Yang, Meng Hao et al. · University of Electronic Science and Technology of China · Singapore Management University +3 more

Proposes first backdoor attack on multivariate time series forecasting, extending attackable horizon 60× to 720 timesteps via lag-aware distributed triggers

Model Poisoning Data Poisoning Attack timeseries
PDF