Latest papers

29 papers
attack arXiv Mar 31, 2026 · 6d ago

SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark

Rui Bao, Zheng Gao, Xiaoyu Li et al. · University of New South Wales · Griffith University

Training-free attack that removes diffusion-based watermarks by deflecting generation trajectories, achieving 95-100% success across nine methods

Output Integrity Attack visiongenerative
PDF
attack arXiv Mar 18, 2026 · 19d ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang et al. · Griffith University · Swinburne University of Technology +2 more

Gradient inversion attack reconstructing training data from federated learning updates via sparse activation recovery without architectural changes

Model Inversion Attack visionfederated-learning
PDF
attack arXiv Mar 17, 2026 · 20d ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision
PDF
defense arXiv Mar 13, 2026 · 24d ago

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li et al. · University of New South Wales · Griffith University

Fine-grained semantic watermarking for diffusion models that embeds tamper-detectable signals across four semantic factors in initial noise

Output Integrity Attack visiongenerative
PDF
attack arXiv Feb 25, 2026 · 5w ago

Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

Zheng Gao, Xiaoyu Li, Zhicheng Bao et al. · University of New South Wales · Griffith University

LLM-guided semantic injection attack that bypasses content-aware watermarks in diffusion-generated images by preserving global coherence while invalidating watermark bindings

Output Integrity Attack visiongenerativenlp
PDF
attack arXiv Feb 11, 2026 · 7w ago

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang et al. · Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence +5 more

Backdoor attack on code models using sharpness-aware training and Gumbel-Softmax triggers for cross-dataset transferability and stealthiness

Model Poisoning nlp
PDF
benchmark arXiv Feb 6, 2026 · 8w ago

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang et al. · Quantstamp · Fujian Normal University +4 more

Empirical study of 98,380 LLM agent skills finds 157 malicious ones using supply chain theft and instruction hijacking

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp
2 citations 1 influentialPDF
attack arXiv Feb 2, 2026 · 9w ago

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang, Zewen Liu, Lu Lin et al. · Emory University · The Pennsylvania State University +2 more

Adversarially decouples time series classifier predictions from explanations, enabling targeted misclassification with plausible-looking cover-up explanations

Input Manipulation Attack timeseries
PDF
defense arXiv Jan 28, 2026 · 9w ago

UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion

Lulu Xue, Shengshan Hu, Wei Lu et al. · Huazhong University of Science and Technology · Institute of Guizhou Aerospace Measuring and Testing Technology +2 more

Defends machine unlearning against inversion attacks that reconstruct erased training data via cosine-space perturbations

Model Inversion Attack vision
PDF
attack arXiv Jan 21, 2026 · 10w ago

Beyond Denial-of-Service: The Puppeteer's Attack for Fine-Grained Control in Ranking-Based Federated Learning

Zhihao Chen, Zirui Gong, Jianting Ning et al. · Fujian Normal University · Griffith University

Novel federated poisoning attack precisely degrades global model accuracy to any target level while evading Byzantine-robust aggregation defenses

Data Poisoning Attack federated-learning
PDF Code
defense arXiv Jan 21, 2026 · 10w ago

Erosion Attack for Adversarial Training to Enhance Semantic Segmentation Robustness

Yufei Song, Ziqi Zhou, Menghao Deng et al. · Huazhong University of Science and Technology · National University of Singapore +1 more

Proposes erosion-based adversarial attack on segmentation models that propagates perturbations from low- to high-confidence pixels, used to strengthen adversarial training robustness

Input Manipulation Attack vision
PDF
attack arXiv Jan 17, 2026 · 11w ago

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li et al. · University of Macau · National Institute of Informatics +2 more

Hard-label black-box adversarial attack using frequency-domain initialization and pattern-driven optimization to recover gradient sign information

Input Manipulation Attack vision
PDF Code
attack arXiv Jan 17, 2026 · 11w ago

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang et al. · Griffith University · University of Technology Sydney +1 more

Adversarial attack exploits visual token compression in VLMs by perturbing token importance rankings, causing failures only under compressed inference

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
tool arXiv Jan 15, 2026 · 11w ago

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng et al. · Nanyang Technological University · Tianjin University +4 more

Scans 31K AI agent skills from marketplaces, finding 26% contain vulnerabilities including prompt injection, data exfiltration, and supply chain risks

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp
8 citations 2 influentialPDF
defense arXiv Dec 21, 2025 · Dec 2025

Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

Junjun Pan, Yixin Liu, Rui Miao et al. · Griffith University · Jilin University +1 more

Defends LLM multi-agent systems by detecting malicious agents using bi-level graph anomaly detection with token-level explainability

Excessive Agency nlpgraph
1 citations PDF
attack arXiv Dec 18, 2025 · Dec 2025

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

Lulu Xue, Shengshan Hu, Linqiang Qian et al. · Huazhong University of Science and Technology · Tsinghua University +4 more

Novel black-box MIA exploits dual-model access after unlearning to infer membership of retained data via likelihood ratio inference

Membership Inference Attack vision
2 citations PDF
defense arXiv Nov 13, 2025 · Nov 2025

Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification

Yuhang Zhou, Yanxiang Zhao, Zhongyun Hua et al. · Harbin Institute of Technology · Chongqing University of Technology +2 more

Proposes novel adversarial training defense for person ReID metric learning via debiased resampling and self-meta generalization across unseen attacks

Input Manipulation Attack vision
PDF Code
attack arXiv Oct 28, 2025 · Oct 2025

Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2

Ziqi Zhou, Yifan Hu, Yufei Song et al. · Huazhong University of Science and Technology · Griffith University

Proposes universal adversarial perturbations that break SAM2 video segmentation via dual semantic deviation across prompts and frames

Input Manipulation Attack vision
10 citations PDF
defense Industrial Conference on Data ... Oct 16, 2025 · Oct 2025

TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening

Nam Le, Leo Yu Zhang, Kewen Liao et al. · Deakin University · Griffith University

Detects backdoored inputs by screening activations against per-class tubular manifold neighborhoods across all layers

Model Poisoning vision
PDF Code
attack IEEE transactions on multimedi... Oct 10, 2025 · Oct 2025

SegTrans: Transferable Adversarial Examples for Segmentation Models

Yufei Song, Ziqi Zhou, Qi Lu et al. · Huazhong University of Science and Technology · Griffith University

Novel transfer attack for segmentation models using local semantic remapping achieves 8.55% higher success than SOTA

Input Manipulation Attack vision
5 citations PDF
Loading more papers…