ML Security Papers

Latest papers

29 papers

attack arXiv Apr 23, 2026 · 28d ago

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo et al. · National University of Defense Technology · City University of Hong Kong +1 more

Gradient-based membership inference attack on federated LLMs achieving near-perfect accuracy via projection residual analysis

Membership Inference Attack nlpfederated-learning

PDF Code

benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative

PDF

defense arXiv Apr 11, 2026 · 5w ago

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Guijia Zhang, Shu Yang, Xilin Gong et al. · Shenzhen University · King Abdullah University of Science & Technology +2 more

Runtime risk-scoring system for LLM agent tool calls that detects indirect prompt injection attacks before execution

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF Code

defense arXiv Apr 2, 2026 · 7w ago

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin et al. · National Huaqiao University · Shenzhen University +2 more

Diffusion-guided adversarial perturbation defense protecting facial images from deepfake manipulation in both white-box and black-box settings

Input Manipulation Attack visiongenerative

PDF

tool arXiv Mar 31, 2026 · 7w ago

GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

Yaning Zhang, Linlin Shen, Zitong Yu et al. · Qilu University of Technology · Shenzhen University +2 more

Deepfake detector using gaze patterns and CLIP-based vision-language matching to attribute and detect GAN/diffusion-generated faces

Output Integrity Attack visionmultimodal

PDF

defense arXiv Mar 27, 2026 · 7w ago

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

Yi Zhang, Hongbo Huang, Liang-Jie Zhang · Shenzhen University

Embeds bit-exact watermarks in diffusion model noise using error-correction codes for lossless AI image authentication and copyright tracking

Output Integrity Attack visiongenerative

PDF Code

tool arXiv Mar 24, 2026 · 8w ago

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Yangxin Yu, Yue Zhou, Bin Li et al. · Shenzhen University · Sun Yat-Sen University +1 more

LLM-guided fusion framework that combines multiple forensic detectors to identify AI-generated images with explainable verdicts

Output Integrity Attack visionmultimodalnlp

PDF

attack arXiv Mar 20, 2026 · 8w ago

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Wenjing Hong, Zhonghua Rong, Li Wang et al. · Shenzhen University · Ltd +2 more

Automated multi-objective evolutionary search framework discovering diverse long-tail jailbreak attacks via encryption-decryption prompt transformations

Prompt Injection nlp

PDF

defense arXiv Mar 12, 2026 · 10w ago

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang et al. · Great Bay University · Shenzhen University +2 more

Forensic-aware visual token pruning for deepfake/AIGC detection VLMs using Birth-Death Optimal Transport to preserve manipulation traces

Output Integrity Attack visionmultimodalnlp

PDF Code

attack arXiv Feb 6, 2026 · Feb 2026

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

Haipeng Li, Rongxuan Peng, Anwei Luo et al. · Shenzhen University · Nanyang Technological University +2 more

Adversarial perturbations that evade AI-generated content detectors by manipulating shared CLIP embeddings toward authentic anchors

Input Manipulation Attack Output Integrity Attack visionmultimodal

PDF

defense arXiv Feb 2, 2026 · Feb 2026

MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection

Ruiqi Liu, Manni Cui, Ziheng Qin et al. · Institute of Automation · School of Advanced Interdisciplinary Sciences +7 more

Detects AI-generated images by projecting inputs to a real-image manifold and using reconstruction residuals as forgery signals, surpassing human experts

Output Integrity Attack visiongenerative

PDF Code

attack arXiv Feb 2, 2026 · Feb 2026

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

Xiaoxi Kong, Jieyu Yuan, Pengdi Chen et al. · Shenzhen University · Nankai University

Removes semantic AI-image watermarks via micro-geometric perturbations that break phase alignment without semantic drift

Output Integrity Attack visiongenerative

PDF

defense arXiv Feb 2, 2026 · Feb 2026

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

Yue Zhou, Xinan He, Kaiqing Lin et al. · Shenzhen University · NanChang University +1 more

Linear classifiers on frozen Vision Foundation Models outperform specialized AIGI detectors by 30%+ in realistic in-the-wild scenarios

Output Integrity Attack vision

PDF

defense arXiv Jan 29, 2026 · Jan 2026

MPF-Net: Exposing High-Fidelity AI-Generated Video Forgeries via Hierarchical Manifold Deviation and Micro-Temporal Fluctuations

Xinan He, Kaiqing Lin, Yue Zhou et al. · NanChang University · Shenzhen University +3 more

Detects AI-generated video forgeries via hierarchical dual-path analysis of manifold deviations and structured inter-frame residual fingerprints

Output Integrity Attack vision

PDF

defense arXiv Dec 7, 2025 · Dec 2025

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen, Jiahui Gao, Kaiqing Lin et al. · Tencent · East China University of Science and Technology +2 more

Proposes task-model alignment combining VLMs and vision models for generalizable AI-generated image detection

Output Integrity Attack visionmultimodal

PDF

defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative

1 citations PDF

defense arXiv Nov 13, 2025 · Nov 2025

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Feng Ding, Wenhui Yi, Yunpeng Zhou et al. · NanChang University · Shenzhen University +1 more

Fairness-aware deepfake detector using channel decoupling and distribution alignment to reduce demographic bias without sacrificing accuracy

Output Integrity Attack vision

PDF

attack arXiv Nov 11, 2025 · Nov 2025

Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs

Yuxuan Zhou, Yuzhao Peng, Yang Bai et al. · Tsinghua University · ByteDance +4 more

Analyzes why mild OOD image manipulation best jailbreaks VLMs, then proposes JOCR, an OCR-based visual attack outperforming SOTA baselines

Input Manipulation Attack Prompt Injection visionmultimodalnlp

PDF

attack arXiv Nov 10, 2025 · Nov 2025

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework

Yuxuan Zhou, Yang Bai, Kuofeng Gao et al. · Tsinghua University · ByteDance +1 more

Multi-agent framework automates black-box jailbreaking of VLMs via coordinated image-text pair generation, achieving 60%+ ASR on GPT-4o

Prompt Injection multimodalnlp

PDF

defense arXiv Nov 10, 2025 · Nov 2025

Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation

Yuxuan Zhou, Tao Yu, Wen Huang et al. · Tsinghua University · CASIA +1 more

Trains deepfake detectors with RL-adaptive curriculum augmentation and causal inference to generalize across unseen forgery domains

Output Integrity Attack vision

PDF

Loading more papers…

Latest papers

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

MPF-Net: Exposing High-Fidelity AI-Generated Video Forgeries via Hierarchical Manifold Deviation and Micro-Temporal Fluctuations

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework

Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue