Latest papers

5 papers
attack arXiv Mar 6, 2026 · 4w ago

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin et al. · Xidian University · Tsinghua University +2 more

Discovers two disentangled safety subspaces in LLMs and exploits them to surgically disable refusal while preserving harmfulness recognition

Prompt Injection nlp
PDF Code
benchmark arXiv Feb 6, 2026 · 8w ago

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang et al. · Quantstamp · Fujian Normal University +4 more

Empirical study of 98,380 LLM agent skills finds 157 malicious ones using supply chain theft and instruction hijacking

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp
2 citations 1 influentialPDF
attack arXiv Jan 21, 2026 · 10w ago

Beyond Denial-of-Service: The Puppeteer's Attack for Fine-Grained Control in Ranking-Based Federated Learning

Zhihao Chen, Zirui Gong, Jianting Ning et al. · Fujian Normal University · Griffith University

Novel federated poisoning attack precisely degrades global model accuracy to any target level while evading Byzantine-robust aggregation defenses

Data Poisoning Attack federated-learning
PDF Code
attack arXiv Dec 26, 2025 · Dec 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al. · Hong Kong University of Science and Technology · Nanjing University of Aeronautics and Astronautics +2 more

Proposes BadVSFM, a two-stage backdoor attack on prompt-driven video segmentation models where classic backdoors fail (<5% ASR)

Model Poisoning vision
PDF
attack arXiv Nov 21, 2025 · Nov 2025

Enhancing Adversarial Transferability through Block Stretch and Shrink

Quan Liu, Feng Ye, Chenhao Lu et al. · Fujian Normal University

Proposes block-level stretch/shrink image transforms to diversify attention and boost adversarial example transferability to black-box models

Input Manipulation Attack vision
PDF