Latest papers

20 papers
benchmark arXiv Mar 21, 2026 · 18d ago

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu et al. · Zhejiang University · Chongqing University +1 more

Measurement study showing FL poisoning attacks are less effective in practice than research suggests due to heterogeneity and stability constraints

Data Poisoning Attack visionnlptabularfederated-learning
PDF Code
attack arXiv Feb 15, 2026 · 7w ago

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Xiaojun Jia, Jie Liao, Simeng Qin et al. · Nanyang Technological University · Chongqing University +4 more

Automated framework crafts stealthy skill-based prompt injections against LLM coding agents using closed-loop refinement agents

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Jan 29, 2026 · 9w ago

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang, Hangcheng Liu, Li Bai et al. · The Hong Kong Polytechnic University · Nanyang Technological University +1 more

Proposes CAGE, a compression-aware adversarial attack exposing that token-compressed VLM robustness is systematically overestimated by standard attacks

Input Manipulation Attack visionmultimodal
PDF
benchmark arXiv Jan 26, 2026 · 10w ago

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Dezhang Kong, Zhuxi Wu, Shiqi Liu et al. · Zhejiang University · National University of Malaysia +4 more

Benchmark revealing LLM web agents fail to detect disguised malicious URLs across 61K attack instances in 10 real-world scenarios

Prompt Injection nlp
PDF Code
defense IEEE Transactions on Image Pro... Jan 23, 2026 · 10w ago

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Qinkai Yu, Chong Zhang, Gaojie Jin et al. · University of Exeter · King Abdullah University of Science and Technology +6 more

Embeds backdoor-based watermarks in medical segmentation models to verify ownership under black-box API conditions

Model Theft vision
PDF Code
defense arXiv Jan 19, 2026 · 11w ago

Proxy Robustness in Vision Language Models is Effortlessly Transferable

Xiaowei Fu, Fuxiang Huang, Lei Zhang · Chongqing University · Lingnan University

Transfers adversarial robustness across heterogeneous CLIP variants via proxy distillation, boosting VLM defense without costly adversarial teacher training

Input Manipulation Attack visionmultimodal
PDF Code
attack arXiv Jan 19, 2026 · 11w ago

CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models

Nay Myat Min, Long H. Pham, Hongyu Zhang et al. · Singapore Management University · Chongqing University

Attacks LLM hallucination detectors by fine-tuning LoRA adapters to camouflage internal uncertainty, hidden-state, and attention signals

Output Integrity Attack nlp
PDF
survey ICICML Jan 18, 2026 · 11w ago

Adversarial Defense in Vision-Language Models: An Overview

Xiaowei Fu, Lei Zhang · Chongqing University

Surveys three adversarial defense paradigms for VLMs—training-time, test-time adaptation, and training-free—highlighting tradeoffs and open challenges

Input Manipulation Attack visionnlpmultimodal
PDF
defense arXiv Jan 17, 2026 · 11w ago

RCDN: Real-Centered Detection Network for Robust Face Forgery Identification

Wyatt McCurdy, Xin Zhang, Yuqi Song et al. · University of Southern Maine · Chongqing University

Proposes real-centered CNN detector for face forgeries that generalizes across unseen AI-generation techniques including diffusion models

Output Integrity Attack vision
PDF
attack arXiv Jan 1, 2026 · Jan 2026

When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents

Zongwei Wang, Bincheng Gu, Hongyu Yu et al. · Chongqing University · The University of Queensland +2 more

Belief Poisoning Attack corrupts LLM agent profiles and memory to make agents treat humans as outgroup, bypassing human-oriented safety behaviors

Prompt Injection Excessive Agency nlp
PDF Code
tool arXiv Dec 21, 2025 · Dec 2025

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

Zhang Wei, Peilu Hu, Zhenyuan Wei et al. · Independent Researcher · Ltd. +12 more

Automated red-teaming tool for LLMs using meta-prompt-guided adversarial generation, finding 3.9× more vulnerabilities than manual testing

Prompt Injection nlp
1 citations PDF
attack arXiv Dec 11, 2025 · Dec 2025

The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks

Zhou Feng, Jiahao Chen, Chunyi Zhou et al. · Zhejiang University · Chongqing University +1 more

Theoretically-grounded backdoor attack exploiting decision boundary ambiguity achieves >90% ASR at just 0.01% poison rate

Model Poisoning vision
PDF Code
benchmark arXiv Dec 6, 2025 · Dec 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo et al. · Nanyang Technological University · BraneMatrix AI +7 more

Unified benchmark and toolbox evaluating 13 attack methods and 15 defenses against multimodal jailbreaks across 18 open- and closed-source MLLMs

Prompt Injection multimodalnlpvision
5 citations PDF Code
defense arXiv Nov 20, 2025 · Nov 2025

Layer-wise Noise Guided Selective Wavelet Reconstruction for Robust Medical Image Segmentation

Yuting Lu, Ziliang Wang, Weixin Xu et al. · Chongqing University · Peking University

Defends medical image segmentation against PGD/SSAH adversarial attacks via layer-wise noise-guided selective wavelet reconstruction

Input Manipulation Attack vision
PDF
defense International Journal of Compu... Nov 14, 2025 · Nov 2025

Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm

Fuxiang Huang, Xiaowei Fu, Shiyu Ye et al. · Chongqing University · Lingnan University +3 more

Defends unsupervised domain adaptation models against adversarial attacks via disentangled distillation post-training

Input Manipulation Attack vision
PDF
attack arXiv Sep 24, 2025 · Sep 2025

Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models

Zhifang Zhang, Jiahan Zhang, Shengjie Zhou et al. · Southeast University · Johns Hopkins University +3 more

Proposes Proxy Targeted Attack to craft generalizable, anomaly-evasive adversarial examples against multimodal encoders like ImageBind

Input Manipulation Attack visionmultimodalnlp
2 citations PDF
defense arXiv Sep 17, 2025 · Sep 2025

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Zhaoyang Chu, Yao Wan, Zhikun Zhang et al. · Huazhong University of Science and Technology · Zhejiang University +4 more

Defends code LLMs against sensitive training data extraction by selectively unlearning memorized PII and credentials via gradient ascent

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Aug 30, 2025 · Aug 2025

FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks

Yuwen Pu, Zhou Feng, Chunyi Zhou et al. · Chongqing University · Zhejiang University

Adds frequency-domain adversarial perturbations to audio in a black-box setting to prevent voice cloning by VC/TTS models

Input Manipulation Attack audio
PDF
tool arXiv Aug 18, 2025 · Aug 2025

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang, Min Gao, Zongwei Wang et al. · Chongqing University · Emory University +1 more

Detects LLM-generated fake news by extracting prompt-induced linguistic fingerprints from reconstructed word-level probability distributions

Output Integrity Attack nlp
PDF Code
attack arXiv Aug 8, 2025 · Aug 2025

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Wenpeng Xing, Mohan Li, Chunqiang Hu et al. · Bingjiang Institute of Zhejiang University · Zhejiang University +3 more

White-box jailbreak fuses harmful and benign hidden states in latent space to bypass LLM safety alignment with 94% ASR

Input Manipulation Attack Prompt Injection nlp
PDF