Weiqiang Wang

attack arXiv Jan 3, 2025 · Jan 2025

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Yanjiang Liu, Shuhen Zhou, Yaojie Lu et al. · Institute of Software · University of Chinese Academy of Sciences +1 more

RL-based automated red-teaming framework that optimizes jailbreak strategies against LLMs, achieving 16.63% higher attack success rates

Prompt Injection Red-Team Agents nlp

PDF

benchmark arXiv Apr 9, 2026 · 6w ago

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Yuankun Xie, Haonan Cheng, Jiayi Zhou et al. · Communication University of China · Ant Group +3 more

Benchmark challenge for detecting AI-generated speech, sound, singing, and music across diverse generation methods and real-world conditions

Output Integrity Attack audiomultimodalnlp

PDF

defense arXiv Aug 28, 2025 · Aug 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, Jun Lan, Zichang Tan et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Proposes Veritas, an MLLM deepfake detector using pattern-aware reasoning, with HydraFake benchmark for hierarchical OOD evaluation

Output Integrity Attack visionmultimodal

PDF

survey arXiv Mar 12, 2026 · 10w ago

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al. · Ant Group · Tsinghua University

Proposes five-layer lifecycle security framework for autonomous LLM agents, analyzing prompt injection, supply chain, memory poisoning, and intent drift threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF

Papers in Database (4)

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats