Latest papers

9 papers
attack arXiv Feb 16, 2026 · 7w ago

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu et al. · The University of Melbourne · The University of Adelaide

Multi-turn jailbreak attack on VLMs that adaptively alternates text and image inputs to bypass safety alignment

Prompt Injection multimodalnlp
PDF Code
defense arXiv Feb 5, 2026 · 8w ago

ALIEN: Analytic Latent Watermarking for Controllable Generation

Liangqi Lei, Keke Gai, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Embeds analytically-derived watermarks in diffusion model latents for content provenance with improved quality and attack robustness

Output Integrity Attack visiongenerative
PDF Code
defense arXiv Nov 24, 2025 · Nov 2025

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Zihan Wang, Zhongkui Ma, Xinguo Feng et al. · The University of Queensland · CSIRO’s Data61 +3 more

Defends model IP with key-locked weights that survive fine-tuning, keeping unauthorized inference at near-random performance

Model Theft vision
1 citations PDF
attack arXiv Nov 23, 2025 · Nov 2025

Robust Physical Adversarial Patches Using Dynamically Optimized Clusters

Harrison Bagley, Will Meakin, Simon Lucey et al. · The University of Adelaide · SmartSat CRC +1 more

Superpixel-based regularization makes physical adversarial patches scale-resilient via differentiable SLIC clustering during optimization

Input Manipulation Attack vision
PDF
defense arXiv Nov 10, 2025 · Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi et al. · Tsinghua University · Beijing University of Posts and Telecommunications +4 more

Proactive adversarial audio perturbations disrupt LLM-based voice cloning by targeting speaker encoders and ASR transcription simultaneously

Input Manipulation Attack Output Integrity Attack audionlp
PDF Code
defense arXiv Oct 30, 2025 · Oct 2025

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su et al. · Beijing University of Posts and Telecommunications · National University of Singapore +3 more

Defends Audio-Language Models against audio-based jailbreaks using universal acoustic perturbations that activate inherent model safety shortcuts

Input Manipulation Attack Prompt Injection audiomultimodalnlp
1 citations PDF Code
benchmark arXiv Oct 21, 2025 · Oct 2025

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

Zijie Xu, Minfeng Qi, Shiqing Wu et al. · Minzu University of China · City University of Macau +1 more

Empirically validates that higher inter-agent trust in LLM multi-agent systems increases sensitive data over-exposure and authorization boundary violations

Excessive Agency Sensitive Information Disclosure nlp
2 citations PDF
benchmark arXiv Aug 18, 2025 · Aug 2025

Systematic Analysis of MCP Security

Yongjian Guo, Puzhuo Liu, Wanlun Ma et al. · Tsinghua University · Ant Group +3 more

Catalogs 31 MCP attack methods into a unified library, empirically revealing LLM agent vulnerabilities in tool-use protocols

Insecure Plugin Design Prompt Injection nlp
PDF
defense arXiv Aug 14, 2025 · Aug 2025

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Keke Gai, Dongjue Wang, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Defends federated learning backdoors under Non-IID data using CLIP zero-shot alignment to eliminate trigger-label correlations

Model Poisoning visionfederated-learningmultimodal
PDF Code