Jie Zhang

h-index: 9 411 citations 20 papers (total)

Papers in Database (2)

attack arXiv Feb 5, 2026 · 8w ago

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Xin Chen, Jie Zhang, Florian Tramèr · ETH Zürich

RL-trained 1.5B model generates universal, transferable prompt injection suffixes that compromise GPT, Claude, and Gemini agents

Prompt Injection nlp
PDF
defense arXiv Oct 18, 2025 · Oct 2025

Patronus: Safeguarding Text-to-Image Models against White-Box Adversaries

Xinfeng Li, Shengyuan Pang, Jialin Wu et al. · Nanyang Technological University · Zhejiang University +1 more

Defends text-to-image diffusion models against white-box fine-tuning attacks via non-fine-tunable safety alignment and feature-level input moderation

Transfer Learning Attack visiongenerative
PDF