Geguang Pu

Papers in Database (2)

defense arXiv Aug 5, 2025 · Aug 2025

Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models

Fan Yang, Yihao Huang, Jiayi Zhu et al. · Huazhong University of Science and Technology · National University of Singapore +2 more

Defends diffusion T2I models against NSFW generation by classifying predicted noise mid-generation, robust to adversarial prompts

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Mar 30, 2026 · 7d ago

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian et al. · East China Normal University · Xinjiang University +1 more

Benchmark evaluating LLM agents' privilege control under prompt injection attacks using real-world tools, finding 84.80% attack success

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF