Weiqiang Wang

Papers in Database (3)

attack arXiv Jan 3, 2025 · Jan 2025

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Yanjiang Liu, Shuhen Zhou, Yaojie Lu et al. · Institute of Software · University of Chinese Academy of Sciences +1 more

RL-based automated red-teaming framework that optimizes jailbreak strategies against LLMs, achieving 16.63% higher attack success rates

Prompt Injection nlp
PDF
defense arXiv Aug 28, 2025 · Aug 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, Jun Lan, Zichang Tan et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Proposes Veritas, an MLLM deepfake detector using pattern-aware reasoning, with HydraFake benchmark for hierarchical OOD evaluation

Output Integrity Attack visionmultimodal
PDF
survey arXiv Mar 12, 2026 · 25d ago

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al. · Ant Group · Tsinghua University

Proposes five-layer lifecycle security framework for autonomous LLM agents, analyzing prompt injection, supply chain, memory poisoning, and intent drift threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF