ML Security Papers

Latest papers

27 papers

attack arXiv Apr 24, 2026 · 27d ago

Sharpness-Aware Poisoning: Enhancing Transferability of Injective Attacks on Recommender Systems

Junsong Xie, Yonghui Yang, Pengyang Shao et al. · Hefei University of Technology · National University of Singapore

Data poisoning attack on recommender systems using sharpness-aware optimization to boost transferability across victim models

Data Poisoning Attack

PDF

defense arXiv Mar 25, 2026 · 8w ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal

PDF

Recent advances in visual-language alignment have endowed vision-language models (VLMs) with fine-grained image understanding capabilities. However, this progress also introduces new privacy risks. This paper first proposes a novel privacy threat model named identity-affiliation learning: an attacker fine-tunes a VLM using only a few private photos of a target individual, thereby embedding associations between the target facial identity and their private property and social relationships into the model's internal representations. Once deployed via public APIs, this model enables unauthorized exposure of the target user's private information upon input of their photos. To benchmark VLMs' susceptibility to such identity-affiliation leakage, we introduce the first identity-affiliation dataset comprising seven typical scenarios appearing in private photos. Each scenario is instantiated with multiple identity-centered photo-description pairs. Experimental results demonstrate that mainstream VLMs like LLaVA, Qwen-VL, and MiniGPT-v2, can recognize facial identities and infer identity-affiliation relationships by fine-tuning on small-scale private photographic dataset, and even on synthetically generated datasets. To mitigate this privacy risk, we propose DP2-VL, the first Dataset Protection framework for private photos that leverages Data Poisoning. Though optimizing imperceptible perturbations by pushing the original representations toward an antithetical region, DP2-VL induces a dataset-level shift in the embedding space of VLMs'encoders. This shift separates protected images from clean inference images, causing fine-tuning on the protected set to overfit. Extensive experiments demonstrate that DP2-VL achieves strong generalization across models, robustness to diverse post-processing operations, and consistent effectiveness across varying protection ratios.

vlm transformer multimodal Shandong University · Shanghai Jiao Tong University · Donghua University +3 more

PDF arXiv

defense arXiv Mar 25, 2026 · 8w ago

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Jie Song, Jun Jia, Wei Sun et al. · Macao Polytechnic University · Shanghai Jiao Tong University +2 more

Medical image fusion model embedding visible copyright watermarks in outputs, removable only with authentication keys

Model Theft Output Integrity Attack visionmultimodal

PDF

defense arXiv Mar 18, 2026 · 9w ago

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Yuxin Liu, Fei Wang, Kun Li et al. · AnHui University · Hefei Comprehensive National Science Center +2 more

Training-free deepfake detection using LVLMs that mines suspicious patch tokens via semantic clustering and frequency-noise anomaly scoring

Output Integrity Attack visionmultimodal

PDF

defense arXiv Mar 11, 2026 · 10w ago

Layer Consistency Matters: Elegant Latent Transition Discrepancy for Generalizable Synthetic Image Detection

Yawen Yang, Feng Li, Shuqi Kong et al. · Hefei University of Technology

Detects AI-generated images by exploiting inter-layer latent representation inconsistencies unique to GAN/diffusion model outputs

Output Integrity Attack visiongenerative

PDF Code

tool arXiv Mar 9, 2026 · 10w ago

SWIFT: Sliding Window Reconstruction for Few-Shot Training-Free Generated Video Attribution

Chao Wang, Zijin Yang, Yaofei Wang et al. · University of Science and Technology of China · Hefei University of Technology

Few-shot, training-free video attribution tool traces generated videos to source models via sliding-window reconstruction loss signals

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Mar 2, 2026 · 11w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative

PDF

benchmark arXiv Feb 26, 2026 · 12w ago

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

Xiaosen Wang, Zhijin Ge, Bohan Liu et al. · Huazhong University of Science and Technology · Xidian University +3 more

Surveys 100+ transfer-based adversarial attacks, proposes unified benchmark framework to address unfair comparisons in the field

Input Manipulation Attack vision

PDF Code

tool arXiv Feb 11, 2026 · Feb 2026

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

Jinjie Shen, Jing Wu, Yaxiong Wang et al. · Hefei University of Technology · Wuhan University

Unified multimodal forgery detection and grounding system using balanced RL to handle text, image, and video fakery simultaneously

Output Integrity Attack multimodalvisionnlp

PDF Code

defense arXiv Jan 22, 2026 · Jan 2026

Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning

Xinjie Zhou, Zhihui Yang, Lechao Cheng et al. · Zhejiang University · Hefei University of Technology

Defends against LLM PII memorization by inverting the model to synthesize pseudo-PII, then selectively unlearning it via LoRA

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

defense arXiv Jan 13, 2026 · Jan 2026

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Zhenhua Xu, Yiran Zhao, Mengting Zhong et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Hierarchical backdoor fingerprinting embeds nested stylistic and semantic triggers in LLMs to prove ownership against black-box theft

Model Theft Model Theft nlp

3 citations PDF Code

defense arXiv Dec 14, 2025 · Dec 2025

Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning

Haiyang Zheng, Nan Pu, Wenjing Li et al. · University of Trento · Hefei University of Technology

Novel open-world deepfake attribution framework that identifies source forgery models for both known and novel synthetic face types

Output Integrity Attack vision

1 citations PDF Code

survey arXiv Dec 6, 2025 · Dec 2025

Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation

Xining Song, Zhihua Wei, Rui Wang et al. · Tongji University · iFLYTEK +2 more

Surveys adversarial, noise, and perturbation attacks on voice conversion models plus defenses, evaluating robustness across four speech quality dimensions

Input Manipulation Attack audio

1 citations PDF

benchmark arXiv Nov 29, 2025 · Nov 2025

MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection

Mengxue Hu, Yunfeng Diao, Changtao Miao et al. · Hefei University of Technology · Ant Group +1 more

Introduces MVAD, the first general-purpose dataset for detecting AI-generated multimodal video-audio content across diverse generators and forgery patterns

Output Integrity Attack visionaudiomultimodalgenerative

1 citations PDF Code

defense arXiv Nov 25, 2025 · Nov 2025

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

Ziqi Wang, Chang Che, Qi Wang et al. · Hefei University of Technology · Tsinghua University +1 more

Defends safety alignment of multimodal LLMs against degradation during continual visual fine-tuning via orthogonal parameter adaptation

Transfer Learning Attack Prompt Injection visionnlpmultimodal

1 citations PDF

defense arXiv Nov 25, 2025 · Nov 2025

Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shandong University · Shanghai Jiao Tong University +2 more

Adversarial perturbation defense that disrupts zero-shot diffusion generation of faces and styles while permitting authenticated access via reversible embedding encryption

Input Manipulation Attack Output Integrity Attack visiongenerative

PDF

defense arXiv Nov 25, 2025 · Nov 2025

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shanghai Jiao Tong University · Shandong University +2 more

Defends facial images from diffusion model customization by adding dual-layer adversarial perturbations that disrupt both fine-tuning and zero-shot identity generation

Output Integrity Attack visiongenerative

PDF

benchmark arXiv Nov 24, 2025 · Nov 2025

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang, Hanchi Sun, Wenjun Sun et al. · Donghua University · Shanghai JiaoTong University +3 more

Benchmarks dataset watermarking schemes for diffusion model traceability and proposes a removal attack that fully defeats them

Output Integrity Attack visiongenerative

PDF

defense arXiv Nov 16, 2025 · Nov 2025

FLClear: Visually Verifiable Multi-Client Watermarking for Federated Learning

Chen Gu, Yingying Sun, Yifan She et al. · Hefei University of Technology

Embeds visually verifiable, collision-free ownership watermarks in federated learning models to defend against malicious server IP theft

Model Theft federated-learning

PDF

defense arXiv Oct 29, 2025 · Oct 2025

EIRES:Training-free AI-Generated Image Detection via Edit-Induced Reconstruction Error Shift

Wan Jiang, Jing Yan, Xiaojing Chen et al. · Hefei University of Technology · AnHui University +1 more

Training-free AI-generated image detector exploiting asymmetric reconstruction error shifts induced by structural edits

Output Integrity Attack visiongenerative

1 citations PDF

Loading more papers…

Latest papers

Sharpness-Aware Poisoning: Enhancing Transferability of Injective Attacks on Recommender Systems

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Layer Consistency Matters: Elegant Latent Transition Discrepancy for Generalizable Synthetic Image Detection

SWIFT: Sliding Window Reconstruction for Few-Shot Training-Free Generated Video Attribution

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning

Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation

MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

FLClear: Visually Verifiable Multi-Client Watermarking for Federated Learning

EIRES:Training-free AI-Generated Image Detection via Edit-Induced Reconstruction Error Shift

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue