Jiawei Zhang

tool arXiv Oct 3, 2025 · Oct 2025

Zhaorun Chen, Xun Liu, Mintong Kang et al. · University of Chicago · University of Illinois +2 more

Adaptive agentic red-teaming system jailbreaks VLMs with 11 multimodal attack strategies, exceeding 90% ASR on Claude-4-Sonnet

Input Manipulation Attack Prompt Injection multimodalnlp

1 citations PDF Code

defense arXiv Oct 20, 2025 · Oct 2025

Jiawei Zhang, Andrew Estornell, David D. Baek et al. · ByteDance · University of Chicago +2 more

Inference-time defense reintroducing alignment tokens mid-generation to block jailbreaks and adversarial prefill attacks in LLMs

Input Manipulation Attack Prompt Injection nlp

Papers in Database (2)