ML Security Papers

Latest papers

4 papers

defense arXiv Mar 31, 2026 · 6d ago

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong et al. · Harbin Institute of Technology · Shenzhen Loop Area Institute

Adversarial fine-tuning framework that preserves vision-language alignment while defending CLIP against adversarial perturbations in zero-shot settings

Input Manipulation Attack visionnlpmultimodal

PDF Code

attack arXiv Mar 18, 2026 · 19d ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative

PDF

attack arXiv Mar 2, 2026 · 5w ago

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Duoxun Tang, Dasen Dai, Jiyao Wang et al. · Tsinghua University · The Chinese University of Hong Kong +4 more

Universal sponge attack on Video-LLMs inflates token generation 205× and inference latency 15× via optimized adversarial video frame triggers

Input Manipulation Attack Model Denial of Service multimodalvisionnlp

PDF Code

attack arXiv Sep 17, 2025 · Sep 2025

A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness

Xuan Luo, Yue Wang, Zefeng He et al. · Harbin Institute of Technology · Hong Kong Polytechnic University +2 more

Jailbreaks LLMs by reframing harmful queries as educational learning questions, bypassing safety alignment on 22 models

Prompt Injection nlp

PDF Code

Latest papers

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue