Xiang Zheng

Papers in Database (2)

tool arXiv Jan 6, 2025 · Jan 2025

CALM: Curiosity-Driven Auditing for Large Language Models

Xiang Zheng, Longxiang Wang, Yi Liu et al. · City University of Hong Kong · Fudan University +1 more

RL-based auditing tool that automatically discovers black-box LLM prompts eliciting toxic or politically sensitive outputs

Prompt Injection nlp
PDF Code
attack arXiv Sep 16, 2025 · Sep 2025

Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models

Yunhan Zhao, Xiang Zheng, Xingjun Ma · Fudan University · City University of Hong Kong

Bimodal VLM jailbreak exploiting weak-defense patterns as attack guides, achieving 80% single-shot ASR via adversarial visual and textual optimization

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF