Latest papers

34 papers
tool arXiv Apr 5, 2026 · 3d ago

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

Hang Wang, Chao Shen, Lei Zhang et al. · Xi’an Jiaotong University · The Hong Kong Polytechnic University +1 more

Detects AI-generated videos by exploiting anomalous temporal self-similarity patterns across visual and semantic modalities

Output Integrity Attack visionmultimodal
PDF Code
attack arXiv Apr 3, 2026 · 5d ago

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Yilin Xiao, Jin Chen, Qinggang Zhang et al. · Southwestern University of Finance and Economics · The Hong Kong Polytechnic University

Graph topology poisoning attack that disrupts GraphRAG logical reasoning by swapping entities to sever multi-hop inference paths

Data Poisoning Attack Input Manipulation Attack Prompt Injection nlpgraph
PDF Code
defense arXiv Mar 9, 2026 · 4w ago

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng et al. · Nanyang Technological University · A*STAR +3 more

Defends against speaker re-identification attacks on LLM speech dialogue models using streaming voice anonymization

Sensitive Information Disclosure audionlp
PDF
attack arXiv Feb 10, 2026 · 8w ago

Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models

Xinwei Zhang, Li Bai, Tianwei Zhang et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Proposes SGMA, a transferable adversarial visual attack on LVLMs targeting semantically critical regions to disrupt cross-modal grounding

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF
defense arXiv Jan 29, 2026 · 9w ago

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Xiaoyu Xu, Minxin Du, Kun Fang et al. · The Hong Kong Polytechnic University · Ant Group

Defends continual LLM unlearning of PII, copyright, and harmful content against adversarial recovery via relearning and quantization attacks

Sensitive Information Disclosure nlp
PDF Code
attack arXiv Jan 29, 2026 · 9w ago

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang, Hangcheng Liu, Li Bai et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Proposes CAGE, a compression-aware adversarial attack exposing that token-compressed VLM robustness is systematically overestimated by standard attacks

Input Manipulation Attack visionmultimodal
PDF
attack arXiv Jan 20, 2026 · 11w ago

LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Mengyu Sun, Ziyuan Yang, Andrew Beng Jin Teoh et al. · Sichuan University · The Hong Kong Polytechnic University +1 more

Attacks concept erasure defenses in diffusion models by reconstructing latent space to reawaken multiple suppressed concepts simultaneously

Input Manipulation Attack visiongenerative
PDF Code
attack arXiv Jan 20, 2026 · 11w ago

Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

Tairan Huang, Qingqing Ye, Yulin Jin et al. · The Hong Kong Polytechnic University

Diffusion-generated floor patch triggers bypass real-world safety control stacks to reliably activate backdoors in RL robot policies

Model Poisoning reinforcement-learningvision
PDF
defense arXiv Jan 11, 2026 · 12w ago

United We Defend: Collaborative Membership Inference Defenses in Federated Learning

Li Bai, Junxu Liu, Sen Zhang et al. · The Hong Kong Polytechnic University · PolyU Research Centre for Privacy and Security Technologies in Future Smart Systems

Collaborative FL defense framework that limits local memorization to defeat trajectory-based membership inference attacks

Membership Inference Attack federated-learningvision
PDF Code
defense arXiv Jan 8, 2026 · Jan 2026

DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization

Lionel Z. Wang, Yusheng Zhao, Jiabin Luo et al. · Nanyang Technological University · The Hong Kong Polytechnic University +3 more

Privacy-preserving AI text detector using adaptive differential privacy entity sanitization that counter-intuitively boosts detection accuracy

Output Integrity Attack nlp
PDF
defense arXiv Dec 18, 2025 · Dec 2025

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Linghui Shen, Mingyue Cui, Xingyi Yang · The Hong Kong Polytechnic University

Defends personal images from AI in-context editing by injecting adversarial perturbations that disrupt cross-attention pathways in diffusion transformers

Input Manipulation Attack visiongenerative
PDF Code
defense arXiv Nov 29, 2025 · Nov 2025

Adversarial Signed Graph Learning with Differential Privacy

Haobin Ke, Sen Zhang, Qingqing Ye et al. · The Hong Kong Polytechnic University

Defends signed GNNs against link-stealing attacks using adversarial training and differential privacy with node-level guarantees

Membership Inference Attack graph
PDF Code
attack arXiv Nov 19, 2025 · Nov 2025

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

Linyin Luo, Yujuan Ding, Yunshan Ma et al. · The Hong Kong Polytechnic University · Sun Yat-Sen University +1 more

Gradient-based adversarial image perturbations attack multimodal RAG systems by hierarchically disrupting cross-modal and semantic alignment

Input Manipulation Attack Prompt Injection visionmultimodalnlp
1 citations PDF
defense arXiv Nov 15, 2025 · Nov 2025

ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

Shaowei Guan, Yu Zhai, Zhengyu Zhang et al. · The Hong Kong Polytechnic University

Defends LLMs against adversarial text perturbations using DeepSeek-Reasoner CoT prompts that purify inputs and explain each defense decision

Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Nov 14, 2025 · Nov 2025

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

Guangke Chen, Yuhui Wang, Shouling Ji et al. · Stony Brook University · Zhejiang University +1 more

Jailbreaks LALM-based TTS safety alignment via semantic obfuscation and audio-modality injection to generate harmful speech

Prompt Injection audionlpmultimodal
PDF
attack arXiv Nov 12, 2025 · Nov 2025

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Tairan Huang, Yulin Jin, Junxu Liu et al. · The Hong Kong Polytechnic University

Black-box adversarial attack on visual RL agents using GAN and shadow Q-model to minimize environment queries

Input Manipulation Attack visionreinforcement-learning
PDF
defense arXiv Nov 11, 2025 · Nov 2025

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

Yaxin Xiao, Qingqing Ye, Zi Liang et al. · The Hong Kong Polytechnic University · Huawei Technologies +1 more

Proposes WRK to break existing black-box model watermarks, then introduces CFW watermarking resilient to combined extraction and removal attacks

Model Theft vision
PDF Code
defense CCS Nov 10, 2025 · Nov 2025

Harnessing Sparsification in Federated Learning: A Secure, Efficient, and Differentially Private Realization

Shuangqing Xu, Yifeng Zheng, Zhongyun Hua · Harbin Institute of Technology · The Hong Kong Polytechnic University

Defends FL against gradient inversion attacks via cryptographic sparse aggregation and differential privacy, beating ORAM by orders of magnitude

Model Inversion Attack federated-learning
2 citations PDF
tool arXiv Oct 22, 2025 · Oct 2025

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Thomas Wang, Haowen Li · OpenGuardrails.com · The Hong Kong Polytechnic University

Open-source LLM guardrails platform unifying prompt-injection defense, jailbreak detection, and PII redaction across 119 languages

Prompt Injection Sensitive Information Disclosure nlp
PDF Code
defense arXiv Oct 18, 2025 · Oct 2025

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning

Haoran Sun, Chen Cai, Huiping Zhuang et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Explainable deepfake video detector using multimodal LLaMA with spatio-temporal chain-of-thought reasoning and facial hard constraints

Output Integrity Attack visionmultimodalnlp
PDF Code
Loading more papers…