Weiqiang Wang

Papers in Database (4)

attack arXiv Jan 3, 2025 · Jan 2025

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Yanjiang Liu, Shuhen Zhou, Yaojie Lu et al. · Institute of Software · University of Chinese Academy of Sciences +1 more

RL-based automated red-teaming framework that optimizes jailbreak strategies against LLMs, achieving 16.63% higher attack success rates

Prompt Injection Red-Team Agents nlp
PDF
benchmark arXiv Apr 9, 2026 · 6w ago

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Yuankun Xie, Haonan Cheng, Jiayi Zhou et al. · Communication University of China · Ant Group +3 more

Benchmark challenge for detecting AI-generated speech, sound, singing, and music across diverse generation methods and real-world conditions

Output Integrity Attack audiomultimodalnlp
PDF
defense arXiv Aug 28, 2025 · Aug 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, Jun Lan, Zichang Tan et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Proposes Veritas, an MLLM deepfake detector using pattern-aware reasoning, with HydraFake benchmark for hierarchical OOD evaluation

Output Integrity Attack visionmultimodal
PDF
survey arXiv Mar 12, 2026 · 10w ago

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al. · Ant Group · Tsinghua University

Proposes five-layer lifecycle security framework for autonomous LLM agents, analyzing prompt injection, supply chain, memory poisoning, and intent drift threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF