Latest papers

13 papers
attack arXiv Mar 23, 2026 · 16d ago

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee · Singapore University of Technology and Design

Comic-based jailbreak attacks on vision-language models achieve 90%+ success by embedding harmful prompts in three-panel visual narratives

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
attack arXiv Mar 17, 2026 · 22d ago

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun, Perry Lam, Shaohua Li et al. · A*STAR · Singapore University of Technology and Design

Multi-dimensional jailbreak attack that fragments and disguises malicious intent across prompt segments to evade LLM safety mechanisms

Prompt Injection nlp
PDF
attack arXiv Jan 10, 2026 · 12w ago

On the Adversarial Robustness of 3D Large Vision-Language Models

Chao Liu, Ngai-Man Cheung · Singapore University of Technology and Design

First adversarial attack study on 3D VLMs, proposing visual token perturbation and output token manipulation attacks to force harmful outputs

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
benchmark arXiv Dec 17, 2025 · Dec 2025

How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

Hua Yang, Alejandro Velasco, Thanh Le-Cong et al. · North Carolina State University · William & Mary +1 more

Semantically equivalent code transformations, especially variable renaming, reduce membership inference success by 10% on code LLMs

Membership Inference Attack nlp
PDF
defense arXiv Dec 8, 2025 · Dec 2025

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland et al. · Singapore University of Technology and Design · Norwegian University of Life Sciences

Frequency-domain masking training strategy improves universal deepfake detector generalization to unseen GAN and diffusion models with pruning robustness

Output Integrity Attack visiongenerative
PDF Code
defense arXiv Oct 1, 2025 · Oct 2025

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

Chenxiang Luo, David K.Y. Yau, Qun Song · City University of Hong Kong · Singapore University of Technology and Design

Defends federated learning clients against gradient inversion attacks by obfuscating gradients via truncated SVD

Model Inversion Attack visionaudiofederated-learning
PDF
attack arXiv Sep 29, 2025 · Sep 2025

TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models

Zhifang Zhang, Qiqi Tao, Jiaqi Lv et al. · Southeast University · Singapore University of Technology and Design +1 more

Stealthy backdoor attack on VLMs swaps subject-object token roles to evade perplexity-based detectors while maintaining high attack success rates

Model Poisoning visionnlpmultimodal
PDF
defense EMNLP Sep 21, 2025 · Sep 2025

Localizing Malicious Outputs from CodeLLM

Mayukh Borana, Junyi Liang, Sai Sathiesh Rajan et al. · Singapore University of Technology and Design

FreqRank uses mutation-based frequency ranking to detect and localize backdoor triggers in Code LLM outputs

Model Poisoning nlp
PDF
benchmark arXiv Sep 18, 2025 · Sep 2025

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Yujia Hu, Ming Shan Hee, Preslav Nakov et al. · Singapore University of Technology and Design · Mohamed bin Zayed University of Artificial Intelligence

Benchmarks multilingual LLM safety guardrails via red-teaming across Singlish, Chinese, Malay, and Tamil toxic prompts

Prompt Injection nlp
PDF Code
attack arXiv Sep 7, 2025 · Sep 2025

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah et al. · Singapore University of Technology and Design · Nanyang Technological University +2 more

Ablates SAE latent features mediating refusal in LLMs to produce mechanistically-grounded jailbreaks via a three-stage pipeline

Prompt Injection nlp
PDF
benchmark arXiv Aug 26, 2025 · Aug 2025

Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness

Wenchuan Mu, Kwan Hui Lim · Singapore University of Technology and Design

Proposes 'tower robustness,' a hypothesis-testing metric for statistically rigorous pre-deployment adversarial robustness evaluation of DNNs

Input Manipulation Attack vision
PDF
attack arXiv Aug 6, 2025 · Aug 2025

Do Vision-Language Models Leak What They Learn? Adaptive Token-Weighted Model Inversion Attacks

Ngoc-Bao Nguyen, Sy-Tuyen Ho, Koh Jun Hao et al. · Singapore University of Technology and Design · College Park

Proposes adaptive token-weighted model inversion attacks that reconstruct private training images from vision-language models

Model Inversion Attack Sensitive Information Disclosure visionmultimodal
PDF
defense arXiv Aug 4, 2025 · Aug 2025

Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients

Sangjun Park, Tony Q.S. Quek, Hyowoon Seo · Kwangwoon University · Singapore University of Technology and Design +1 more

Defends split learning against malicious clients via pigeonhole-based cluster partitioning that isolates and discards poisoned updates

Data Poisoning Attack federated-learning
PDF