ML Security Papers

Latest papers

13 papers

attack arXiv Mar 23, 2026 · 16d ago

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee · Singapore University of Technology and Design

Comic-based jailbreak attacks on vision-language models achieve 90%+ success by embedding harmful prompts in three-panel visual narratives

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

attack arXiv Mar 17, 2026 · 22d ago

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun, Perry Lam, Shaohua Li et al. · A*STAR · Singapore University of Technology and Design

Multi-dimensional jailbreak attack that fragments and disguises malicious intent across prompt segments to evade LLM safety mechanisms

Prompt Injection nlp

PDF

attack arXiv Jan 10, 2026 · 12w ago

On the Adversarial Robustness of 3D Large Vision-Language Models

Chao Liu, Ngai-Man Cheung · Singapore University of Technology and Design

First adversarial attack study on 3D VLMs, proposing visual token perturbation and output token manipulation attacks to force harmful outputs

Input Manipulation Attack Prompt Injection visionnlpmultimodal

PDF

benchmark arXiv Dec 17, 2025 · Dec 2025

How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

Hua Yang, Alejandro Velasco, Thanh Le-Cong et al. · North Carolina State University · William & Mary +1 more

Semantically equivalent code transformations, especially variable renaming, reduce membership inference success by 10% on code LLMs

Membership Inference Attack nlp

PDF

defense arXiv Dec 8, 2025 · Dec 2025

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland et al. · Singapore University of Technology and Design · Norwegian University of Life Sciences

Frequency-domain masking training strategy improves universal deepfake detector generalization to unseen GAN and diffusion models with pruning robustness

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Oct 1, 2025 · Oct 2025

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

Chenxiang Luo, David K.Y. Yau, Qun Song · City University of Hong Kong · Singapore University of Technology and Design

Defends federated learning clients against gradient inversion attacks by obfuscating gradients via truncated SVD

Model Inversion Attack visionaudiofederated-learning

PDF

attack arXiv Sep 29, 2025 · Sep 2025

TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models

Zhifang Zhang, Qiqi Tao, Jiaqi Lv et al. · Southeast University · Singapore University of Technology and Design +1 more

Stealthy backdoor attack on VLMs swaps subject-object token roles to evade perplexity-based detectors while maintaining high attack success rates

Model Poisoning visionnlpmultimodal

PDF

defense EMNLP Sep 21, 2025 · Sep 2025

Localizing Malicious Outputs from CodeLLM

Mayukh Borana, Junyi Liang, Sai Sathiesh Rajan et al. · Singapore University of Technology and Design

FreqRank uses mutation-based frequency ranking to detect and localize backdoor triggers in Code LLM outputs

Model Poisoning nlp

PDF

benchmark arXiv Sep 18, 2025 · Sep 2025

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Yujia Hu, Ming Shan Hee, Preslav Nakov et al. · Singapore University of Technology and Design · Mohamed bin Zayed University of Artificial Intelligence

Benchmarks multilingual LLM safety guardrails via red-teaming across Singlish, Chinese, Malay, and Tamil toxic prompts

Prompt Injection nlp

PDF Code

attack arXiv Sep 7, 2025 · Sep 2025

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah et al. · Singapore University of Technology and Design · Nanyang Technological University +2 more

Ablates SAE latent features mediating refusal in LLMs to produce mechanistically-grounded jailbreaks via a three-stage pipeline

Prompt Injection nlp

PDF

benchmark arXiv Aug 26, 2025 · Aug 2025

Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness

Wenchuan Mu, Kwan Hui Lim · Singapore University of Technology and Design

Proposes 'tower robustness,' a hypothesis-testing metric for statistically rigorous pre-deployment adversarial robustness evaluation of DNNs

Input Manipulation Attack vision

PDF

attack arXiv Aug 6, 2025 · Aug 2025

Do Vision-Language Models Leak What They Learn? Adaptive Token-Weighted Model Inversion Attacks

Ngoc-Bao Nguyen, Sy-Tuyen Ho, Koh Jun Hao et al. · Singapore University of Technology and Design · College Park

Proposes adaptive token-weighted model inversion attacks that reconstruct private training images from vision-language models

Model Inversion Attack Sensitive Information Disclosure visionmultimodal

PDF

defense arXiv Aug 4, 2025 · Aug 2025

Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients

Sangjun Park, Tony Q.S. Quek, Hyowoon Seo · Kwangwoon University · Singapore University of Technology and Design +1 more

Defends split learning against malicious clients via pigeonhole-based cluster partitioning that isolates and discards poisoned updates

Data Poisoning Attack federated-learning

PDF

Latest papers

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

On the Adversarial Robustness of 3D Large Vision-Language Models

How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models

Localizing Malicious Outputs from CodeLLM

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness

Do Vision-Language Models Leak What They Learn? Adaptive Token-Weighted Model Inversion Attacks

Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue