Latest papers

134 papers
attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
attack arXiv Apr 1, 2026 · 5d ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning
PDF
attack arXiv Mar 31, 2026 · 6d ago

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

Yunrui Yu, Xuxiang Feng, Pengda Qin et al. · Tsinghua University · University of Macau +1 more

Novel adversarial attack targeting dummy-class defenses by simultaneously attacking true and dummy labels with adaptive weighting

Input Manipulation Attack vision
PDF
attack arXiv Mar 30, 2026 · 7d ago

InkDrop: Invisible Backdoor Attacks Against Dataset Condensation

He Yang, Dongyi Lv, Song Ma et al. · Xi'an Jiaotong University · Tsinghua University

Stealthy backdoor attack on dataset condensation using boundary-proximate samples and imperceptible perturbations to evade detection

Model Poisoning vision
PDF Code
benchmark arXiv Mar 30, 2026 · 7d ago

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian et al. · East China Normal University · Xinjiang University +1 more

Benchmark evaluating LLM agents' privilege control under prompt injection attacks using real-world tools, finding 84.80% attack success

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 25, 2026 · 12d ago

Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness

Yunrui Yu, Hang Su, Jun Zhu · Tsinghua University

Discovers optimal adversarial robustness occurs when activation function curvature falls within 4-10, revealing fundamental expressivity-sharpness trade-off

Input Manipulation Attack vision
PDF
defense arXiv Mar 24, 2026 · 13d ago

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Yang Li, Yule Liu, Xinlei He et al. · Tsinghua University · The Hong Kong University of Science and Technology +1 more

Fine-tunes LLMs to generate explicit authorization reasoning chains before responses, defending against unauthorized access and prompt injection

Prompt Injection Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 20, 2026 · 17d ago

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang et al. · Northwest Institute of Nuclear Technology · Tsinghua University +1 more

Unifies adversarial robustness and LLM hallucination under a geometric uncertainty principle, proposing defenses without adversarial training

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
tool arXiv Mar 19, 2026 · 18d ago

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Haochen Zhao, Shaoyang Cui · National University of Singapore · Tsinghua University

MITM-based red-teaming framework that tests autonomous web agent security through real-time network traffic manipulation attacks

Prompt Injection Excessive Agency nlp
PDF Code
attack arXiv Mar 16, 2026 · 21d ago

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Yihao Zhang, Zeming Wei, Xiaokun Luan et al. · Peking University · Sun Yat-Sen University +3 more

Self-replicating worm attack on LLM agent ecosystems achieving autonomous propagation through configuration hijacking and broadcast infection

AI Supply Chain Attacks Prompt Injection Excessive Agency nlpmultimodal
PDF
attack arXiv Mar 14, 2026 · 23d ago

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling, Pingyi Hu, Xiuyong Gao et al. · Huazhong University of Science and Technology · Tsinghua University +1 more

Inaudible near-ultrasonic acoustic channel attack that delivers jailbreak prompts to speech-driven LLMs through commodity hardware

Input Manipulation Attack Prompt Injection nlpaudiomultimodal
PDF
defense arXiv Mar 13, 2026 · 24d ago

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

Yanna Jiang, Guangsheng Yu, Qingyuan Yu et al. · University of Technology Sydney · Independent +2 more

Defeats Neural Structural Obfuscation attacks on model watermarks by canonicalizing neural networks to restore watermark verification

Model Theft vision
PDF Code
defense arXiv Mar 13, 2026 · 24d ago

Spectral Defense Against Resource-Targeting Attack in 3D Gaussian Splatting

Yang Chen, Yi Yu, Jiaming He et al. · Nanyang Technological University · UESTC +3 more

Spectral filtering defense against data poisoning attacks that cause excessive Gaussian growth in 3D scene reconstruction

Data Poisoning Attack vision
PDF
survey arXiv Mar 12, 2026 · 25d ago

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al. · Ant Group · Tsinghua University

Proposes five-layer lifecycle security framework for autonomous LLM agents, analyzing prompt injection, supply chain, memory poisoning, and intent drift threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 7, 2026 · 4w ago

mAVE: A Watermark for Joint Audio-Visual Generation Models

Luyang Si, Leyi Pan, Lijie Wen · Tsinghua University

Cryptographic watermark binds audio-video latents in joint generative models to block deepfake Swap Attacks with >99% integrity

Output Integrity Attack multimodalaudiovisiongenerative
PDF
attack arXiv Mar 7, 2026 · 4w ago

Targeted Bit-Flip Attacks on LLM-Based Agents

Jialai Wang, Ya Wen, Zhongmou Liu et al. · National University of Singapore · Tsinghua University +1 more

Flip-Agent exploits hardware bit-flips to corrupt LLM agent weights, hijacking tool calls and final outputs in multi-stage pipelines

Model Poisoning Excessive Agency nlp
PDF
attack arXiv Mar 6, 2026 · 4w ago

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Jinman Wu, Yi Xie, Shiqian Zhao et al. · Xidian University · Tsinghua University +1 more

White-box jailbreak targeting LLM attention heads via layer-wise perturbation, improving ASR 14% over SOTA

Input Manipulation Attack Prompt Injection nlp
PDF Code
attack arXiv Mar 6, 2026 · 4w ago

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin et al. · Xidian University · Tsinghua University +2 more

Discovers two disentangled safety subspaces in LLMs and exploits them to surgically disable refusal while preserving harmfulness recognition

Prompt Injection nlp
PDF Code
benchmark arXiv Mar 5, 2026 · 4w ago

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul et al. · SCB DataX · Tsinghua University +2 more

Benchmarks 24 LLMs against Thai-language jailbreaks, finding culturally-grounded attacks bypass safety guardrails at higher rates than general prompts

Prompt Injection nlp
PDF Code
attack arXiv Mar 5, 2026 · 4w ago

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang et al. · Beihang University · Tsinghua University

Detects LLM pre-training data via gradient deviation scores capturing update magnitude, location, and concentration in FFN/Attention modules

Membership Inference Attack nlp
PDF
Loading more papers…