ML Security Papers

Stats

Latest papers

9 papers

attack arXiv Feb 16, 2026 · 7w ago

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu et al. · The University of Melbourne · The University of Adelaide

Multi-turn jailbreak attack on VLMs that adaptively alternates text and image inputs to bypass safety alignment

Prompt Injection multimodalnlp

PDF Code

defense arXiv Feb 5, 2026 · 8w ago

ALIEN: Analytic Latent Watermarking for Controllable Generation

Liangqi Lei, Keke Gai, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Embeds analytically-derived watermarks in diffusion model latents for content provenance with improved quality and attack robustness

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Nov 24, 2025 · Nov 2025

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Zihan Wang, Zhongkui Ma, Xinguo Feng et al. · The University of Queensland · CSIRO’s Data61 +3 more

Defends model IP with key-locked weights that survive fine-tuning, keeping unauthorized inference at near-random performance

Model Theft vision

1 citations PDF

Deep neural networks (DNNs) have become valuable intellectual property of model owners, due to the substantial resources required for their development. To protect these assets in the deployed environment, recent research has proposed model usage control mechanisms to ensure models cannot be used without proper authorization. These methods typically lock the utility of the model by embedding an access key into its parameters. However, they often assume static deployment, and largely fail to withstand continual post-deployment model updates, such as fine-tuning or task-specific adaptation. In this paper, we propose ADALOC, to endow key-based model usage control with adaptability during model evolution. It strategically selects a subset of weights as an intrinsic access key, which enables all model updates to be confined to this key throughout the evolution lifecycle. ADALOC enables using the access key to restore the keyed model to the latest authorized states without redistributing the entire network (i.e., adaptation), and frees the model owner from full re-keying after each model update (i.e., lock preservation). We establish a formal foundation to underpin ADALOC, providing crucial bounds such as the errors introduced by updates restricted to the access key. Experiments on standard benchmarks, such as CIFAR-100, Caltech-256, and Flowers-102, and modern architectures, including ResNet, DenseNet, and ConvNeXt, demonstrate that ADALOC achieves high accuracy under significant updates while retaining robust protections. Specifically, authorized usages consistently achieve strong task-specific performance, while unauthorized usage accuracy drops to near-random guessing levels (e.g., 1.01% on CIFAR-100), compared to up to 87.01% without ADALOC. This shows that ADALOC can offer a practical solution for adaptive and protected DNN deployment in evolving real-world scenarios.

cnn The University of Queensland · CSIRO’s Data61 · City University of Hong Kong +2 more

PDF arXiv DOI

attack arXiv Nov 23, 2025 · Nov 2025

Robust Physical Adversarial Patches Using Dynamically Optimized Clusters

Harrison Bagley, Will Meakin, Simon Lucey et al. · The University of Adelaide · SmartSat CRC +1 more

Superpixel-based regularization makes physical adversarial patches scale-resilient via differentiable SLIC clustering during optimization

Input Manipulation Attack vision

PDF

defense arXiv Nov 10, 2025 · Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi et al. · Tsinghua University · Beijing University of Posts and Telecommunications +4 more

Proactive adversarial audio perturbations disrupt LLM-based voice cloning by targeting speaker encoders and ASR transcription simultaneously

Input Manipulation Attack Output Integrity Attack audionlp

PDF Code

defense arXiv Oct 30, 2025 · Oct 2025

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su et al. · Beijing University of Posts and Telecommunications · National University of Singapore +3 more

Defends Audio-Language Models against audio-based jailbreaks using universal acoustic perturbations that activate inherent model safety shortcuts

Input Manipulation Attack Prompt Injection audiomultimodalnlp

1 citations PDF Code

benchmark arXiv Oct 21, 2025 · Oct 2025

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

Zijie Xu, Minfeng Qi, Shiqing Wu et al. · Minzu University of China · City University of Macau +1 more

Empirically validates that higher inter-agent trust in LLM multi-agent systems increases sensitive data over-exposure and authorization boundary violations

Excessive Agency Sensitive Information Disclosure nlp

2 citations PDF

benchmark arXiv Aug 18, 2025 · Aug 2025

Systematic Analysis of MCP Security

Yongjian Guo, Puzhuo Liu, Wanlun Ma et al. · Tsinghua University · Ant Group +3 more

Catalogs 31 MCP attack methods into a unified library, empirically revealing LLM agent vulnerabilities in tool-use protocols

Insecure Plugin Design Prompt Injection nlp

PDF

defense arXiv Aug 14, 2025 · Aug 2025

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Keke Gai, Dongjue Wang, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Defends federated learning backdoors under Non-IID data using CLIP zero-shot alignment to eliminate trigger-label correlations

Model Poisoning visionfederated-learningmultimodal

PDF Code

Latest papers

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

ALIEN: Analytic Latent Watermarking for Controllable Generation

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Robust Physical Adversarial Patches Using Dynamically Optimized Clusters

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

Systematic Analysis of MCP Security

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue