Latest papers

33 papers
defense arXiv Apr 27, 2026 · 24d ago

LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization

Bokang Zeng, Zheng Gao, Xiaoyu Li et al. · UNSW Sydney · Griffith University

Audio-visual watermarking framework that detects and localizes deepfake tampering in videos while surviving compression and multimodal misalignment

Output Integrity Attack multimodalvisionaudio
PDF
attack arXiv Apr 14, 2026 · 5w ago

CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems

Yongxuan Wu, Xixun Lin, He Zhang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Black-box attack inferring LLM multi-agent system communication topologies via adversarial queries, achieving 99% peak AUC

Model Theft Excessive Agency nlp
PDF Code
attack arXiv Apr 3, 2026 · 6w ago

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yubin Qu, Yi Liu, Tongcheng Geng et al. · Griffith University · Quantstamp +6 more

Supply-chain attack embedding malicious payloads in LLM agent skill documentation, achieving up to 33.5% bypass of defenses

AI Supply Chain Attacks Insecure Plugin Design Excessive Agency nlp
PDF
benchmark arXiv Apr 3, 2026 · 6w ago

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu et al. · Fujian Normal University · Wake Forest University +7 more

Large-scale analysis of 17K LLM agent skills finding 520 vulnerable to credential leakage via debug logging and prompt injection

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Mar 31, 2026 · 7w ago

SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark

Rui Bao, Zheng Gao, Xiaoyu Li et al. · University of New South Wales · Griffith University

Training-free attack that removes diffusion-based watermarks by deflecting generation trajectories, achieving 95-100% success across nine methods

Output Integrity Attack visiongenerative
PDF
attack arXiv Mar 18, 2026 · 9w ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang et al. · Griffith University · Swinburne University of Technology +2 more

Gradient inversion attack reconstructing training data from federated learning updates via sparse activation recovery without architectural changes

Model Inversion Attack visionfederated-learning
PDF
attack arXiv Mar 17, 2026 · 9w ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision
PDF
defense arXiv Mar 13, 2026 · 9w ago

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li et al. · University of New South Wales · Griffith University

Fine-grained semantic watermarking for diffusion models that embeds tamper-detectable signals across four semantic factors in initial noise

Output Integrity Attack visiongenerative
PDF
attack arXiv Feb 25, 2026 · 12w ago

Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

Zheng Gao, Xiaoyu Li, Zhicheng Bao et al. · University of New South Wales · Griffith University

LLM-guided semantic injection attack that bypasses content-aware watermarks in diffusion-generated images by preserving global coherence while invalidating watermark bindings

Output Integrity Attack visiongenerativenlp
PDF
attack arXiv Feb 11, 2026 · Feb 2026

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang et al. · Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence +5 more

Backdoor attack on code models using sharpness-aware training and Gumbel-Softmax triggers for cross-dataset transferability and stealthiness

Model Poisoning nlp
PDF
benchmark arXiv Feb 6, 2026 · Feb 2026

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang et al. · Quantstamp · Fujian Normal University +4 more

Empirical study of 98,380 LLM agent skills finds 157 malicious ones using supply chain theft and instruction hijacking

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp
2 citations 1 influentialPDF
attack arXiv Feb 2, 2026 · Feb 2026

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang, Zewen Liu, Lu Lin et al. · Emory University · The Pennsylvania State University +2 more

Adversarially decouples time series classifier predictions from explanations, enabling targeted misclassification with plausible-looking cover-up explanations

Input Manipulation Attack timeseries
PDF
defense arXiv Jan 28, 2026 · Jan 2026

UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion

Lulu Xue, Shengshan Hu, Wei Lu et al. · Huazhong University of Science and Technology · Institute of Guizhou Aerospace Measuring and Testing Technology +2 more

Defends machine unlearning against inversion attacks that reconstruct erased training data via cosine-space perturbations

Model Inversion Attack vision
PDF
attack arXiv Jan 21, 2026 · Jan 2026

Beyond Denial-of-Service: The Puppeteer's Attack for Fine-Grained Control in Ranking-Based Federated Learning

Zhihao Chen, Zirui Gong, Jianting Ning et al. · Fujian Normal University · Griffith University

Novel federated poisoning attack precisely degrades global model accuracy to any target level while evading Byzantine-robust aggregation defenses

Data Poisoning Attack federated-learning
PDF Code
defense arXiv Jan 21, 2026 · Jan 2026

Erosion Attack for Adversarial Training to Enhance Semantic Segmentation Robustness

Yufei Song, Ziqi Zhou, Menghao Deng et al. · Huazhong University of Science and Technology · National University of Singapore +1 more

Proposes erosion-based adversarial attack on segmentation models that propagates perturbations from low- to high-confidence pixels, used to strengthen adversarial training robustness

Input Manipulation Attack vision
PDF
attack arXiv Jan 17, 2026 · Jan 2026

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li et al. · University of Macau · National Institute of Informatics +2 more

Hard-label black-box adversarial attack using frequency-domain initialization and pattern-driven optimization to recover gradient sign information

Input Manipulation Attack vision
PDF Code
attack arXiv Jan 17, 2026 · Jan 2026

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang et al. · Griffith University · University of Technology Sydney +1 more

Adversarial attack exploits visual token compression in VLMs by perturbing token importance rankings, causing failures only under compressed inference

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
tool arXiv Jan 15, 2026 · Jan 2026

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng et al. · Nanyang Technological University · Tianjin University +4 more

Scans 31K AI agent skills from marketplaces, finding 26% contain vulnerabilities including prompt injection, data exfiltration, and supply chain risks

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp
8 citations 2 influentialPDF
defense arXiv Dec 21, 2025 · Dec 2025

Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

Junjun Pan, Yixin Liu, Rui Miao et al. · Griffith University · Jilin University +1 more

Defends LLM multi-agent systems by detecting malicious agents using bi-level graph anomaly detection with token-level explainability

Excessive Agency nlpgraph
1 citations PDF
attack arXiv Dec 18, 2025 · Dec 2025

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

Lulu Xue, Shengshan Hu, Linqiang Qian et al. · Huazhong University of Science and Technology · Tsinghua University +4 more

Novel black-box MIA exploits dual-model access after unlearning to infer membership of retained data via likelihood ratio inference

Membership Inference Attack vision
2 citations PDF
Loading more papers…