Latest papers

4 papers
defense arXiv Mar 19, 2026 · 18d ago

Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness

Lu Yu, Haiyang Zhang, Changsheng Xu · Tianjin University of Technology · Chinese Academy of Sciences +1 more

Defends CLIP against adversarial examples using complementary text-guided attention to maintain zero-shot generalization while improving robustness

Input Manipulation Attack visionnlpmultimodal
PDF Code
attack arXiv Jan 30, 2026 · 9w ago

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization

Manyi Li, Yufan Liu, Lai Jiang et al. · University of the Chinese Academy of Sciences · Chinese Academy of Sciences +2 more

Attacks machine unlearning defenses in diffusion models by optimizing initial latent variables to reactivate erased NSFW knowledge

Input Manipulation Attack visiongenerative
PDF Code
benchmark arXiv Jan 30, 2026 · 9w ago

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

Enyi Shi, Pengyang Shao, Yanxin Zhang et al. · Nanjing University of Science and Technology · National University of Singapore +3 more

Multilingual multimodal safety benchmark revealing cross-lingual asymmetries in VLLM jailbreak susceptibility across 10 languages and 11 models

Prompt Injection multimodalnlp
PDF Code
attack arXiv Sep 8, 2025 · Sep 2025

Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?

Junjie Mu, Zonghao Ying, Zhekui Fan et al. · Beihang University · 360 AI Security Lab +4 more

Identifies redundant tokens in GCG adversarial suffixes via learnable masking, reducing LLM jailbreak attack time by 16.8%.

Input Manipulation Attack Prompt Injection nlp
PDF