Latest papers

124 papers
attack arXiv Apr 1, 2026 · 5d ago

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li et al. · National University of Singapore

Agentic framework that automates membership inference attacks through self-exploration and strategy evolution, outperforming handcrafted baselines

Membership Inference Attack
PDF Code
attack arXiv Mar 27, 2026 · 10d ago

R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting

Tianrui Lou, Siyuan Liang, Jiawei Liang et al. · Sun Yat-Sen University · National University of Singapore

Physical adversarial camouflage attack on autonomous vehicles using relightable 3D Gaussian splatting for robustness across lighting and viewing angles

Input Manipulation Attack vision
PDF
attack arXiv Mar 27, 2026 · 10d ago

PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing

Bhavya Kohli, Biplab Sikdar · National University of Singapore

Black-box node injection attack on GNNs exploiting topology-driven message passing via eigenvalue alignment without requiring node features

Input Manipulation Attack graph
PDF
defense arXiv Mar 23, 2026 · 14d ago

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Xingyu Zhu, Beier Zhu, Shuo Wang et al. · University of Science and Technology of China · National University of Singapore +1 more

Null-space projection defense that blocks VLM jailbreaks while preserving benign performance through theoretically-grounded activation steering

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
tool arXiv Mar 19, 2026 · 18d ago

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning

Zhihui Chen, Kai He, Qingyuan Lei et al. · National University of Singapore · The Chinese University of Hong Kong +3 more

Detects medical image deepfakes via localize-then-analyze reasoning with expert-aligned explanations on synthetic lesion edits

Output Integrity Attack visionmultimodal
PDF Code
tool arXiv Mar 19, 2026 · 18d ago

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Haochen Zhao, Shaoyang Cui · National University of Singapore · Tsinghua University

MITM-based red-teaming framework that tests autonomous web agent security through real-time network traffic manipulation attacks

Prompt Injection Excessive Agency nlp
PDF Code
defense arXiv Mar 18, 2026 · 19d ago

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang et al. · Zhejiang University · Xi’an Jiaotong University +1 more

Black-box backdoor detector for speech models exploiting dual stability anomalies under semantic-breaking and semantic-preserving perturbations

Model Poisoning audio
PDF
defense arXiv Mar 18, 2026 · 19d ago

Proof-of-Authorship for Diffusion-based AI Generated Content

De Zhang Lee, Han Fang, Ee-Chien Chang · National University of Singapore

Cryptographic proof-of-authorship for diffusion-generated images by binding generation seeds to author identity using pseudorandom functions

Output Integrity Attack visiongenerative
PDF
tool arXiv Mar 18, 2026 · 19d ago

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng et al. · National University of Singapore · Max-Planck Institute of Security and Privacy

Greybox fuzzing framework that discovers indirect prompt injection vulnerabilities in LLM agents by mutating prompts and tracking tool invocations

Prompt Injection Excessive Agency nlp
PDF
attack arXiv Mar 17, 2026 · 20d ago

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

Yong Zou, Haoran Li, Fanxiao Li et al. · Yunnan University · Northeastern University +1 more

Black-box adversarial image prompt attack that bypasses concept unlearning in diffusion models, recovering erased copyrighted and harmful concepts

Input Manipulation Attack visionmultimodalgenerative
PDF Code
defense arXiv Mar 14, 2026 · 23d ago

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Ming-Hui Liu, Harry Cheng, Xin Luo et al. · Shandong University · National University of Singapore

Deepfake detector exploiting real image distribution invariance to generalize across unseen forgery types and domains

Output Integrity Attack vision
PDF
attack arXiv Mar 13, 2026 · 24d ago

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Qichen Zhao, Shengfang Zhai, Xinjian Bai et al. · Peking University · National University of Singapore +1 more

Defeats image protection schemes via purification attacks, removing adversarial perturbations to restore full editability under model mismatch

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Mar 12, 2026 · 25d ago

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai et al. · Fraunhofer AISEC · Nanyang Technological University +3 more

Benchmarks documentation-embedded indirect prompt injection against high-privilege LLM agents, achieving 85% exfiltration success with 0% human detection rate

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Mar 12, 2026 · 25d ago

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

Chuancheng Shi, Wenhua Wu, Fei Shen et al. · University of Sydney · National University of Singapore +2 more

Defends T2I diffusion models from adversarial induction of harmful content via orthogonal projection that preserves benign semantic subspaces during concept erasure

Prompt Injection visiongenerative
PDF
defense arXiv Mar 10, 2026 · 27d ago

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Chao Shuai, Zhenguang Liu, Shaojing Fan et al. · Zhejiang University · National University of Singapore +1 more

Proposes GSD module to block semantic shortcuts in VFM-based detectors, improving generalization to unseen AI-generated image pipelines

Output Integrity Attack visiongenerative
PDF Code
attack arXiv Mar 9, 2026 · 28d ago

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Guangnian Wan, Xinyin Ma, Gongfan Fang et al. · National University of Singapore

Fine-tunes LLMs via API to covertly embed harmful content in steganographic cover responses, bypassing safety classifiers 100% of the time

Transfer Learning Attack Model Poisoning nlp
PDF Code
attack arXiv Mar 7, 2026 · 4w ago

Targeted Bit-Flip Attacks on LLM-Based Agents

Jialai Wang, Ya Wen, Zhongmou Liu et al. · National University of Singapore · Tsinghua University +1 more

Flip-Agent exploits hardware bit-flips to corrupt LLM agent weights, hijacking tool calls and final outputs in multi-stage pipelines

Model Poisoning Excessive Agency nlp
PDF
defense arXiv Mar 6, 2026 · 4w ago

Word-Anchored Temporal Forgery Localization

Tianyi Wang, Xi Shao, Harry Cheng et al. · National University of Singapore · Nanjing University of Posts and Telecommunications +1 more

Detects audio-visual deepfake segments via word-token binary classification, outperforming regression-based TFL baselines

Output Integrity Attack audiovisionmultimodal
PDF
attack arXiv Mar 3, 2026 · 4w ago

Scores Know Bobs Voice: Speaker Impersonation Attack

Chanwoo Hwang, Sunpill Kim, Yong Kiam Tan et al. · Hanyang University · A*STAR +2 more

Feature-aligned latent inversion achieves 91% speaker impersonation with 10x fewer black-box score queries

Input Manipulation Attack audio
PDF Code
defense arXiv Feb 28, 2026 · 5w ago

ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

Haodong Zhao, Jinming Hu, Zhaomin Wu et al. · Shanghai Jiao Tong University · National University of Singapore +1 more

Defends federated LLM instruction tuning against interspersed backdoor poisoning using frequency-domain gradient signals and global clustering

Model Poisoning Data Poisoning Attack nlpfederated-learning
PDF Code
Loading more papers…