Latest papers

3,669 papers
attack arXiv Apr 2, 2026 · 4d ago

Spike-PTSD: A Bio-Plausible Adversarial Example Attack on Spiking Neural Networks via PTSD-Inspired Spike Scaling

Lingxin Jin, Wei Jiang, Maregu Assefa Habtie et al. · University of Electronic Science and Technology · Khalifa University

Bio-inspired adversarial attack on Spiking Neural Networks achieving 99% success by exploiting PTSD-like abnormal neuron firing patterns

Input Manipulation Attack vision
PDF Code
defense arXiv Apr 2, 2026 · 4d ago

Combating Data Laundering in LLM Training

Muxing Li, Zesheng Ye, Sharon Li et al. · University of Melbourne · University of Wisconsin-Madison

Detects unauthorized LLM training data use even when original data has been laundered through style transformations

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Apr 2, 2026 · 4d ago

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin et al. · National Huaqiao University · Shenzhen University +2 more

Diffusion-guided adversarial perturbation defense protecting facial images from deepfake manipulation in both white-box and black-box settings

Input Manipulation Attack visiongenerative
PDF
attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
attack arXiv Apr 2, 2026 · 4d ago

CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders

Su-Hyeon Kim, Hyundong Jin, Yejin Lee et al. · Yonsei University

Circuit-guided feature selection for LLM jailbreaking that identifies causal refusal features via cross-layer transcoders and boundary prompts

Prompt Injection nlp
PDF
defense arXiv Apr 2, 2026 · 4d ago

Moiré Video Authentication: A Physical Signature Against AI Video Generation

Yuan Qing, Kunyu Zheng, Lingxiao Li et al. · Boston University

Physics-based video authentication using Moiré interference patterns that real cameras produce but AI generators cannot faithfully reproduce

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 2, 2026 · 4d ago

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen et al. · Fudan University

Constructs dataset of 114 malicious MCP servers exploiting LLM tool-calling and proposes behavioral deviation detector achieving 94.6% F1

Insecure Plugin Design nlp
PDF
attack arXiv Apr 2, 2026 · 4d ago

Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters

Ahmed B Mustafa, Zihan Ye, Yang Lu et al. · University of Nottingham · Xi’an Jiaotong-Liverpool University +1 more

Low-effort prompt-based jailbreaks bypass text-to-image safety filters using linguistic reframing, achieving 74% attack success

Prompt Injection multimodalgenerative
PDF
attack arXiv Apr 1, 2026 · 5d ago

Out of Sight, Out of Track: Adversarial Attacks on Propagation-based Multi-Object Trackers via Query State Manipulation

Halima Bouzidi, Haoyu Liu, Yonatan Gizachew Achamyeleh et al. · University of California

Adversarial attacks on multi-object trackers that flood query budgets and corrupt temporal memory to force track terminations

Input Manipulation Attack vision
PDF
attack arXiv Apr 1, 2026 · 5d ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning
PDF
defense arXiv Apr 1, 2026 · 5d ago

TRACE: Training-Free Partial Audio Deepfake Detection via Embedding Trajectory Analysis of Speech Foundation Models

Awais Khan, Muhammad Umar Farooq, Kutub Uddin et al. · University of Michigan-Flint

Training-free partial audio deepfake detector using speech foundation model embedding dynamics, achieving competitive performance without labeled data

Output Integrity Attack audio
PDF
defense arXiv Apr 1, 2026 · 5d ago

RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government Retrieval-Augmented Generation Systems

KrishnaSaiReddy Patil

Defense-in-depth framework using cryptographic provenance verification to block knowledge base poisoning attacks in government RAG systems

Data Poisoning Attack Training Data Poisoning nlp
PDF
defense arXiv Apr 1, 2026 · 5d ago

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi et al. · National University of Defense Technology · Nanjing University +1 more

Interpretable DNN repair using Shapley-guided fault localization and derivative-free optimization for backdoor removal, adversarial defense, and fairness

Input Manipulation Attack Model Poisoning vision
PDF
attack arXiv Apr 1, 2026 · 5d ago

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

Ravi Ranjan, Utkarsh Grover, Xiaomin Lin et al. · Florida International University · University of South Florida

White-box membership inference attack using gradient-induced feature drift, outperforming confidence-based and reference-based MIAs on LLMs

Membership Inference Attack nlp
PDF
attack arXiv Apr 1, 2026 · 5d ago

Adversarial Attenuation Patch Attack for SAR Object Detection

Yiming Zhang, Weibo Qin, Feng Wang · Fudan University

Adversarial patch attack on SAR target detection achieving stealthiness and physical realizability through energy-constrained optimization

Input Manipulation Attack vision
PDF Code
defense arXiv Apr 1, 2026 · 5d ago

WARP: Guaranteed Inner-Layer Repair of NLP Transformers

Hsin-Ling Hsu, Min-Yu Chen, Nai-Chia Chen et al. · National Chengchi University

Constraint-based model repair framework providing provable guarantees for correcting adversarial misclassifications in NLP Transformers

Input Manipulation Attack nlp
PDF
survey arXiv Apr 1, 2026 · 5d ago

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar · SovereignAI Security Labs

Unified threat model for world model AI systems covering adversarial attacks, data poisoning, alignment risks, and cognitive security

Input Manipulation Attack Data Poisoning Attack Model Poisoning Prompt Injection Excessive Agency reinforcement-learningmultimodalvisionnlp
PDF
benchmark arXiv Apr 1, 2026 · 5d ago

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan et al. · George Mason University · Tulane University +2 more

Benchmark of 120 prompt injection attacks on personal AI agents across skill files, emails, and web content

Prompt Injection Excessive Agency nlpmultimodal
PDF
attack arXiv Apr 1, 2026 · 5d ago

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li et al. · National University of Singapore

Agentic framework that automates membership inference attacks through self-exploration and strategy evolution, outperforming handcrafted baselines

Membership Inference Attack
PDF Code
defense arXiv Apr 1, 2026 · 5d ago

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska et al. · University of Nevada · Oak Ridge National Laboratory

Detects LLM jailbreak attacks using logit distributions over numerical tokens, achieving 22.66% ASR reduction with minimal overhead

Prompt Injection nlp
PDF
Loading more papers…