ML Security Papers

Latest papers

34 papers

tool arXiv Apr 5, 2026 · 3d ago

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

Hang Wang, Chao Shen, Lei Zhang et al. · Xi’an Jiaotong University · The Hong Kong Polytechnic University +1 more

Detects AI-generated videos by exploiting anomalous temporal self-similarity patterns across visual and semantic modalities

Output Integrity Attack visionmultimodal

PDF Code

attack arXiv Apr 3, 2026 · 5d ago

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Yilin Xiao, Jin Chen, Qinggang Zhang et al. · Southwestern University of Finance and Economics · The Hong Kong Polytechnic University

Graph topology poisoning attack that disrupts GraphRAG logical reasoning by swapping entities to sever multi-hop inference paths

Data Poisoning Attack Input Manipulation Attack Prompt Injection nlpgraph

PDF Code

defense arXiv Mar 9, 2026 · 4w ago

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng et al. · Nanyang Technological University · A*STAR +3 more

Defends against speaker re-identification attacks on LLM speech dialogue models using streaming voice anonymization

Sensitive Information Disclosure audionlp

PDF

attack arXiv Feb 10, 2026 · 8w ago

Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models

Xinwei Zhang, Li Bai, Tianwei Zhang et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Proposes SGMA, a transferable adversarial visual attack on LVLMs targeting semantically critical regions to disrupt cross-modal grounding

Input Manipulation Attack Prompt Injection visionmultimodalnlp

PDF

defense arXiv Jan 29, 2026 · 9w ago

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Xiaoyu Xu, Minxin Du, Kun Fang et al. · The Hong Kong Polytechnic University · Ant Group

Defends continual LLM unlearning of PII, copyright, and harmful content against adversarial recovery via relearning and quantization attacks

Sensitive Information Disclosure nlp

PDF Code

attack arXiv Jan 29, 2026 · 9w ago

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang, Hangcheng Liu, Li Bai et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Proposes CAGE, a compression-aware adversarial attack exposing that token-compressed VLM robustness is systematically overestimated by standard attacks

Input Manipulation Attack visionmultimodal

PDF

attack arXiv Jan 20, 2026 · 11w ago

LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Mengyu Sun, Ziyuan Yang, Andrew Beng Jin Teoh et al. · Sichuan University · The Hong Kong Polytechnic University +1 more

Attacks concept erasure defenses in diffusion models by reconstructing latent space to reawaken multiple suppressed concepts simultaneously

Input Manipulation Attack visiongenerative

PDF Code

attack arXiv Jan 20, 2026 · 11w ago

Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

Tairan Huang, Qingqing Ye, Yulin Jin et al. · The Hong Kong Polytechnic University

Diffusion-generated floor patch triggers bypass real-world safety control stacks to reliably activate backdoors in RL robot policies

Model Poisoning reinforcement-learningvision

PDF

defense arXiv Jan 11, 2026 · 12w ago

United We Defend: Collaborative Membership Inference Defenses in Federated Learning

Li Bai, Junxu Liu, Sen Zhang et al. · The Hong Kong Polytechnic University · PolyU Research Centre for Privacy and Security Technologies in Future Smart Systems

Collaborative FL defense framework that limits local memorization to defeat trajectory-based membership inference attacks

Membership Inference Attack federated-learningvision

PDF Code

defense arXiv Jan 8, 2026 · Jan 2026

DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization

Lionel Z. Wang, Yusheng Zhao, Jiabin Luo et al. · Nanyang Technological University · The Hong Kong Polytechnic University +3 more

Privacy-preserving AI text detector using adaptive differential privacy entity sanitization that counter-intuitively boosts detection accuracy

Output Integrity Attack nlp

PDF

defense arXiv Dec 18, 2025 · Dec 2025

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Linghui Shen, Mingyue Cui, Xingyi Yang · The Hong Kong Polytechnic University

Defends personal images from AI in-context editing by injecting adversarial perturbations that disrupt cross-attention pathways in diffusion transformers

Input Manipulation Attack visiongenerative

PDF Code

defense arXiv Nov 29, 2025 · Nov 2025

Adversarial Signed Graph Learning with Differential Privacy

Haobin Ke, Sen Zhang, Qingqing Ye et al. · The Hong Kong Polytechnic University

Defends signed GNNs against link-stealing attacks using adversarial training and differential privacy with node-level guarantees

Membership Inference Attack graph

PDF Code

attack arXiv Nov 19, 2025 · Nov 2025

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

Linyin Luo, Yujuan Ding, Yunshan Ma et al. · The Hong Kong Polytechnic University · Sun Yat-Sen University +1 more

Gradient-based adversarial image perturbations attack multimodal RAG systems by hierarchically disrupting cross-modal and semantic alignment

Input Manipulation Attack Prompt Injection visionmultimodalnlp

1 citations PDF

defense arXiv Nov 15, 2025 · Nov 2025

ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

Shaowei Guan, Yu Zhai, Zhengyu Zhang et al. · The Hong Kong Polytechnic University

Defends LLMs against adversarial text perturbations using DeepSeek-Reasoner CoT prompts that purify inputs and explain each defense decision

Input Manipulation Attack Prompt Injection nlp

PDF

attack arXiv Nov 14, 2025 · Nov 2025

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

Guangke Chen, Yuhui Wang, Shouling Ji et al. · Stony Brook University · Zhejiang University +1 more

Jailbreaks LALM-based TTS safety alignment via semantic obfuscation and audio-modality injection to generate harmful speech

Prompt Injection audionlpmultimodal

PDF

Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully reproduces input text and mimics specified speaker identities. While prior misuse studies have focused on speaker impersonation, this work explores a distinct content-centric threat: exploiting TTS systems to produce speech containing harmful content. Realizing such threats poses two core challenges: (1) LALM safety alignment frequently rejects harmful prompts, yet existing jailbreak attacks are ill-suited for TTS because these systems are designed to faithfully vocalize any input text, and (2) real-world deployment pipelines often employ input/output filters that block harmful text and audio. We present HARMGEN, a suite of five attacks organized into two families that address these challenges. The first family employs semantic obfuscation techniques (Concat, Shuffle) that conceal harmful content within text. The second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems and three datasets spanning two languages, we demonstrate that our attacks substantially reduce refusal rates and increase the toxicity of generated speech. We further assess both reactive countermeasures deployed by audio-streaming platforms and proactive defenses implemented by TTS providers. Our analysis reveals critical vulnerabilities: deepfake detectors underperform on high-fidelity audio; reactive moderation can be circumvented by adversarial perturbations; while proactive moderation detects 57-93% of attacks. Our work highlights a previously underexplored content-centric misuse vector for TTS and underscore the need for robust cross-modal safeguards throughout training and deployment.

llm multimodal Stony Brook University · Zhejiang University · The Hong Kong Polytechnic University

PDF arXiv DOI

attack arXiv Nov 12, 2025 · Nov 2025

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Tairan Huang, Yulin Jin, Junxu Liu et al. · The Hong Kong Polytechnic University

Black-box adversarial attack on visual RL agents using GAN and shadow Q-model to minimize environment queries

Input Manipulation Attack visionreinforcement-learning

PDF

defense arXiv Nov 11, 2025 · Nov 2025

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

Yaxin Xiao, Qingqing Ye, Zi Liang et al. · The Hong Kong Polytechnic University · Huawei Technologies +1 more

Proposes WRK to break existing black-box model watermarks, then introduces CFW watermarking resilient to combined extraction and removal attacks

Model Theft vision

PDF Code

defense CCS Nov 10, 2025 · Nov 2025

Harnessing Sparsification in Federated Learning: A Secure, Efficient, and Differentially Private Realization

Shuangqing Xu, Yifeng Zheng, Zhongyun Hua · Harbin Institute of Technology · The Hong Kong Polytechnic University

Defends FL against gradient inversion attacks via cryptographic sparse aggregation and differential privacy, beating ORAM by orders of magnitude

Model Inversion Attack federated-learning

2 citations PDF

tool arXiv Oct 22, 2025 · Oct 2025

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Thomas Wang, Haowen Li · OpenGuardrails.com · The Hong Kong Polytechnic University

Open-source LLM guardrails platform unifying prompt-injection defense, jailbreak detection, and PII redaction across 119 languages

Prompt Injection Sensitive Information Disclosure nlp

PDF Code

defense arXiv Oct 18, 2025 · Oct 2025

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning

Haoran Sun, Chen Cai, Huiping Zhuang et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Explainable deepfake video detector using multimodal LLaMA with spatio-temporal chain-of-thought reasoning and facial hard constraints

Output Integrity Attack visionmultimodalnlp

PDF Code

The rapid development of deepfake video technology has not only facilitated artistic creation but also made it easier to spread misinformation. Traditional deepfake video detection (DVD) methods face issues such as a lack of transparency in their principles and insufficient generalization capabilities to cope with evolving forgery techniques. This highlights an urgent need for detectors that can identify forged content and provide verifiable reasoning explanations. This paper proposes the explainable deepfake video detection (EDVD) task and designs the EDVD-LLaMA multimodal, a large language model (MLLM) reasoning framework, which provides traceable reasoning processes alongside accurate detection results and trustworthy explanations. Our approach first incorporates a Spatio-Temporal Subtle Information Tokenization (ST-SIT) to extract and fuse global and local cross-frame deepfake features, providing rich spatio-temporal semantic information input for MLLM reasoning. Second, we construct a Fine-grained Multimodal Chain-of-Thought (Fg-MCoT) mechanism, which introduces facial feature data as hard constraints during the reasoning process to achieve pixel-level spatio-temporal video localization, suppress hallucinated outputs, and enhance the reliability of the chain of thought. In addition, we build an Explainable Reasoning FF++ dataset (ER-FF++set), leveraging structured data to annotate videos and ensure quality control, thereby supporting dual supervision for reasoning and detection. Extensive experiments demonstrate that EDVD-LLaMA achieves outstanding performance and robustness in terms of detection accuracy, explainability, and its ability to handle cross-forgery methods and cross-dataset scenarios. Compared to previous DVD methods, it provides a more explainable and superior solution. The project page is available at: https://11ouo1.github.io/edvd-llama/.

vlm llm transformer The Hong Kong Polytechnic University · Nanyang Technological University · South China University of Technology

PDF arXiv DOI Code

Loading more papers…

Latest papers

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

United We Defend: Collaborative Membership Inference Defenses in Federated Learning

DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Adversarial Signed Graph Learning with Differential Privacy

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

Harnessing Sparsification in Federated Learning: A Secure, Efficient, and Differentially Private Realization

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue