Xiaojun Jia

h-index: 12 629 citations 46 papers (total)

Papers in Database (5)

benchmark arXiv Dec 6, 2025 · Dec 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo et al. · Nanyang Technological University · BraneMatrix AI +7 more

Unified benchmark and toolbox evaluating 13 attack methods and 15 defenses against multimodal jailbreaks across 18 open- and closed-source MLLMs

Prompt Injection multimodalnlpvision
5 citations PDF Code
attack arXiv Oct 28, 2025 · Oct 2025

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Yufan Liu, Wanqian Zhang, Huashan Chen et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Black-box LLM-driven attack generates human-readable adversarial prompts that bypass T2I safety filters with 1000x speedup

Input Manipulation Attack visionnlpgenerative
2 citations PDF
defense arXiv Oct 10, 2025 · Oct 2025

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Xiaonan Si, Meilin Zhu, Simeng Qin et al. · Institute of Software · University of Chinese Academy of Sciences +5 more

Defends RAG systems from corpus poisoning via two-stage semantic and conflict-aware filtering before LLM generation

Prompt Injection nlp
2 citations PDF
attack arXiv Dec 24, 2025 · Dec 2025

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Yifan Huang, Xiaojun Jia, Wenbo Guo et al. · Nanyang Technological University · National University of Singapore

Jailbreak framework using sentence pairing achieves 84% attack success on GPT-4.1 for malicious code generation

Prompt Injection nlp
1 citations PDF
defense arXiv Nov 16, 2025 · Nov 2025

Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection

Jiayi Zhu, Yihao Huang, Yue Cao et al. · Xidian University · Ltd +5 more

Defends geo-privacy by embedding semantics-aware deceptive text overlays around images to mislead LVLMs into predicting wrong geolocations.

Input Manipulation Attack Prompt Injection visionmultimodal
PDF