Latest papers

64 papers
attack arXiv Apr 23, 2026 · 28d ago

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo et al. · National University of Defense Technology · City University of Hong Kong +1 more

Gradient-based membership inference attack on federated LLMs achieving near-perfect accuracy via projection residual analysis

Membership Inference Attack nlpfederated-learning
PDF Code
attack arXiv Apr 18, 2026 · 4w ago

Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

Jiachen Qian · City University of Hong Kong

Multimodal memory poisoning attack that embeds visual triggers in images to hijack AI agent planning, plus dual-process defense

Input Manipulation Attack Data Poisoning Attack Prompt Injection Excessive Agency multimodalnlp
PDF
defense arXiv Apr 18, 2026 · 4w ago

CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution

Shutong Jin, Ruiyi Guo, Ray C. C. Cheung · City University of Hong Kong · Beijing Foreign Studies University

Broker-mediated capability system that prevents AI agents from directly accessing secrets, defending against prompt injection exfiltration

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Apr 16, 2026 · 5w ago

Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification

Weiwei Zhuang, Wangze Xie, Qi Zhang et al. · Xiamen University of Technology · City University of Macau +8 more

Generates physically plausible fog-based adversarial perturbations for remote sensing classifiers with high transferability and defense robustness

Input Manipulation Attack vision
PDF
defense arXiv Apr 1, 2026 · 7w ago

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

Jingning Xu, Haochen Luo, Chen Liu · City University of Hong Kong

Training-free defense using text augmentation to protect VLMs against diverse adversarial image perturbations at inference time

Input Manipulation Attack multimodalvisionnlp
PDF
attack arXiv Mar 25, 2026 · 8w ago

How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li, Zi Liang et al. · China University of Geosciences · Hong Kong University of Science and Technology +4 more

Query-based extraction attack on quantized edge LLMs using clustered instruction queries to steal model behavior efficiently

Model Theft Model Theft nlp
PDF
defense arXiv Mar 23, 2026 · 8w ago

Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning

Xi Xuan, Wenxin Zhang, Zhiyu Li et al. · University of Eastern Finland · City University of Hong Kong +3 more

Disentangles speaker traits from deepfake source embeddings using Chebyshev polynomials and Riemannian geometry for robust generator verification

Output Integrity Attack audiogenerative
PDF Code
attack arXiv Mar 18, 2026 · 9w ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
attack arXiv Mar 18, 2026 · 9w ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang et al. · Griffith University · Swinburne University of Technology +2 more

Gradient inversion attack reconstructing training data from federated learning updates via sparse activation recovery without architectural changes

Model Inversion Attack visionfederated-learning
PDF
benchmark arXiv Mar 11, 2026 · 10w ago

Probabilistic Verification of Voice Anti-Spoofing Models

Evgeny Kushnir, Alexandr Kozodaev, Dmitrii Korzh et al. · AXXX · HSE +5 more

Proposes PV-VASM, a black-box probabilistic framework that formally bounds misclassification risk of speech deepfake detectors against TTS and voice cloning attacks

Output Integrity Attack audio
PDF
defense arXiv Mar 11, 2026 · 10w ago

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

Yu He, Haozhe Zhu, Yiming Li et al. · Zhejiang University · Nanyang Technological University +1 more

Runtime defense for LLM agents detecting indirect prompt injection via causal counterfactual analysis of tool invocations

Prompt Injection nlp
PDF Code
benchmark arXiv Mar 8, 2026 · 10w ago

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan et al. · Shandong University · City University of Hong Kong

Large-scale empirical analysis reveals MCP servers fail to authenticate callers, enabling unauthorized tool access in LLM agent systems

Insecure Plugin Design nlp
PDF
defense arXiv Feb 24, 2026 · 12w ago

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

Haonan An, Xiaohui Ye, Guang Hua et al. · South China University of Technology · Singapore Institute of Technology +1 more

Embeds face content as background watermark to robustly detect, localize, and recover manipulated face regions against removal attacks

Output Integrity Attack visiongenerative
PDF
attack arXiv Feb 24, 2026 · 12w ago

OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

Longxiang Wang, Xiang Zheng, Xuhao Zhang et al. · City University of Hong Kong · ByteDance

Attacks multi-tenant LLM services via KV cache side-channels to reconstruct private prompts with 12× efficiency gains

Sensitive Information Disclosure nlp
PDF
attack arXiv Feb 23, 2026 · 12w ago

PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention

Hefei Mei, Zirui Wang, Chang Xu et al. · City University of Hong Kong · The University of Sydney

Gray-box adversarial attack on LVLM vision encoders using prototype anchoring and attention-guided perturbations, achieving 75.1% score reduction

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF Code
defense arXiv Feb 11, 2026 · Feb 2026

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Xinguo Feng, Zhongkui Ma, Zihan Wang et al. · The University of Queensland · CSIRO’s Data61 +1 more

Defends collaborative LLM training against gradient inversion by replacing tokens with semantically disconnected yet embedding-proximate shadow substitutes

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF
attack arXiv Feb 6, 2026 · Feb 2026

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu et al. · The University of Hong Kong · Nanjing University +1 more

Fine-tunes an LLM as a poison generator to inject robust, chunking-aware malicious content into RAG knowledge bases

Data Poisoning Attack Prompt Injection nlp
PDF
defense arXiv Feb 4, 2026 · Feb 2026

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative
PDF
defense arXiv Jan 30, 2026 · Jan 2026

Color Matters: Demosaicing-Guided Color Correlation Training for Generalizable AI-Generated Image Detection

Nan Zhong, Yiran Xu, Mian Zou · City University of Hong Kong · Fudan University +1 more

Detects AI-generated images via camera CFA color correlations, achieving state-of-the-art generalization across 20+ unseen generators

Output Integrity Attack vision
PDF
attack arXiv Jan 29, 2026 · Jan 2026

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Xiang Zheng, Yutao Wu, Hanxun Huang et al. · City University of Hong Kong · Deakin University +4 more

Self-evolving agent framework extracts hidden system prompts from 41 commercial LLMs using UCB-guided natural language probing strategies

Sensitive Information Disclosure Prompt Injection nlp
PDF
Loading more papers…