Latest papers

186 papers
benchmark arXiv Apr 4, 2026 · 4d ago

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

Peijun Bao, Anwei Luo, Gang Pan et al. · Zhejiang University · Nanyang Technological University +4 more

Benchmark dataset and diffusion-based detector for localizing AI-manipulated activity segments seamlessly inserted into authentic videos

Output Integrity Attack visionmultimodal
PDF Code
benchmark arXiv Apr 3, 2026 · 5d ago

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu et al. · Fujian Normal University · Wake Forest University +7 more

Large-scale analysis of 17K LLM agent skills finding 520 vulnerable to credential leakage via debug logging and prompt injection

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Apr 3, 2026 · 5d ago

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yubin Qu, Yi Liu, Tongcheng Geng et al. · Griffith University · Quantstamp +6 more

Supply-chain attack embedding malicious payloads in LLM agent skill documentation, achieving up to 33.5% bypass of defenses

AI Supply Chain Attacks Insecure Plugin Design Excessive Agency nlp
PDF
attack arXiv Mar 31, 2026 · 8d ago

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong et al. · Nanyang Technological University

Embeds imperceptible adversarial prompts into images via visual perturbations to jailbreak closed-source multimodal LLMs

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
defense arXiv Mar 25, 2026 · 14d ago

High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Peipeng Yu, Jinfeng Xie, Chengfu Ou et al. · Jinan University · University of Macau +2 more

Embeds semantic watermarks in face images for copyright protection, pixel-level deepfake localization, and content recovery after manipulation

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 25, 2026 · 14d ago

Enhancing and Reporting Robustness Boundary of Neural Code Models for Intelligent Code Understanding

Tingxu Han, Wei Song, Weisong Sun et al. · Nanjing University · University of New South Wales +2 more

Black-box certified defense for code models using randomized smoothing to reduce adversarial attack success from 42% to 9.74%

Input Manipulation Attack nlp
PDF
defense arXiv Mar 25, 2026 · 14d ago

Unleashing Vision-Language Semantics for Deepfake Video Detection

Jiawen Zhu, Yunqi Miao, Xueyi Zhang et al. · The University of Warwick · Nanyang Technological University +2 more

Deepfake detector leveraging CLIP's vision-language semantics with identity-aware prompting to achieve fine-grained forgery localization

Output Integrity Attack visionmultimodal
PDF Code
defense arXiv Mar 24, 2026 · 15d ago

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Miao Yu, Siyuan Fu, Moayad Aloqaily et al. · University of Science and Technology of China · Squirrel AI Learning +4 more

Mechanistic interpretability framework identifying sparse safety circuits in LLMs for backdoor removal and alignment preservation

Model Poisoning Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Mar 23, 2026 · 16d ago

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Xingyu Zhu, Beier Zhu, Shuo Wang et al. · University of Science and Technology of China · National University of Singapore +1 more

Null-space projection defense that blocks VLM jailbreaks while preserving benign performance through theoretically-grounded activation steering

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
attack arXiv Mar 20, 2026 · 19d ago

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Wenjing Hong, Zhonghua Rong, Li Wang et al. · Shenzhen University · Ltd +2 more

Automated multi-objective evolutionary search framework discovering diverse long-tail jailbreak attacks via encryption-decryption prompt transformations

Prompt Injection nlp
PDF
attack arXiv Mar 15, 2026 · 24d ago

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Ruoxi Cheng, Yizhong Ding, Hongyi Zhang et al. · Beijing Electronic Science and Technology Institute · Alibaba Group +2 more

Text-only membership inference attack on CLIP/CLAP models that detects PII memorization without exposing biometric data

Membership Inference Attack multimodalvisionaudionlp
PDF
defense arXiv Mar 13, 2026 · 26d ago

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He et al. · Southeast University · Nanyang Technological University +2 more

Test-time backdoor defense for LVLMs that detects poisoned inputs via cross-modal attention anomalies and purifies them by pruning trigger tokens

Model Poisoning multimodalnlpvision
PDF
defense arXiv Mar 13, 2026 · 26d ago

Spectral Defense Against Resource-Targeting Attack in 3D Gaussian Splatting

Yang Chen, Yi Yu, Jiaming He et al. · Nanyang Technological University · UESTC +3 more

Spectral filtering defense against data poisoning attacks that cause excessive Gaussian growth in 3D scene reconstruction

Data Poisoning Attack vision
PDF
defense arXiv Mar 13, 2026 · 26d ago

STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs

Nandish Chattopadhyay, Anadi Goyal, Chandan Karfa et al. · Indian Institute of Technology · Nanyang Technological University

Detects and neutralizes adversarial patches on ViTs by identifying anomalous tokens and applying randomized transformations

Input Manipulation Attack vision
PDF
attack arXiv Mar 13, 2026 · 26d ago

CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

Shuhan Xu, Siyuan Liang, Hongling Zheng et al. · Wuhan University · Nanyang Technological University +1 more

Adversarial attack on diffusion I2V models that disrupts temporal consistency via low-dimensional velocity field perturbations

Input Manipulation Attack visiongenerative
PDF
attack arXiv Mar 12, 2026 · 27d ago

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Zikang Ding, Haomiao Yang, Meng Hao et al. · University of Electronic Science and Technology of China · Singapore Management University +2 more

Proposes temporally-delayed backdoor attacks on NLP pre-trained models using common everyday words as stealthy triggers

Model Poisoning nlp
PDF
benchmark arXiv Mar 12, 2026 · 27d ago

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai et al. · Fraunhofer AISEC · Nanyang Technological University +3 more

Benchmarks documentation-embedded indirect prompt injection against high-privilege LLM agents, achieving 85% exfiltration success with 0% human detection rate

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Mar 11, 2026 · 28d ago

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

Yu He, Haozhe Zhu, Yiming Li et al. · Zhejiang University · Nanyang Technological University +1 more

Runtime defense for LLM agents detecting indirect prompt injection via causal counterfactual analysis of tool invocations

Prompt Injection nlp
PDF Code
defense arXiv Mar 9, 2026 · 4w ago

Where, What, Why: Toward Explainable 3D-GS Watermarking

Mingshu Cai, Jiajun Li, Osamu Yoshie et al. · Waseda University · Southeast University +1 more

Watermarks 3D Gaussian Splatting assets with explainable carrier selection, improving visual quality by +0.83 dB and bit-accuracy by +1.24% over prior methods

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 9, 2026 · 4w ago

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng et al. · Nanyang Technological University · A*STAR +3 more

Defends against speaker re-identification attacks on LLM speech dialogue models using streaming voice anonymization

Sensitive Information Disclosure audionlp
PDF
Loading more papers…