Yujia Hu

benchmark arXiv Sep 18, 2025 · Sep 2025

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Yujia Hu, Ming Shan Hee, Preslav Nakov et al. · Singapore University of Technology and Design · Mohamed bin Zayed University of Artificial Intelligence

Benchmarks multilingual LLM safety guardrails via red-teaming across Singlish, Chinese, Malay, and Tamil toxic prompts

Prompt Injection nlp

PDF Code

attack arXiv Mar 23, 2026 · 14d ago

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee · Singapore University of Technology and Design

Comic-based jailbreak attacks on vision-language models achieve 90%+ success by embedding harmful prompts in three-panel visual narratives

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

Papers in Database (2)

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models