Huijia Zhu

attack arXiv Jan 3, 2025 · Jan 2025

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Yanjiang Liu, Shuhen Zhou, Yaojie Lu et al. · Institute of Software · University of Chinese Academy of Sciences +1 more

RL-based automated red-teaming framework that optimizes jailbreak strategies against LLMs, achieving 16.63% higher attack success rates

Prompt Injection nlp

PDF

tool arXiv Feb 9, 2026 · 8w ago

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Hao Tan, Jun Lan, Senyuan Shi et al. · Institute of Automation · Ant Group +2 more

Detects AI-generated videos using MLLMs enhanced with perception pretext reinforcement learning and a new 3K-video benchmark

Output Integrity Attack visionmultimodalnlp

PDF Code

defense arXiv Aug 28, 2025 · Aug 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, Jun Lan, Zichang Tan et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Proposes Veritas, an MLLM deepfake detector using pattern-aware reasoning, with HydraFake benchmark for hierarchical OOD evaluation

Output Integrity Attack visionmultimodal

PDF

defense arXiv Sep 12, 2025 · Sep 2025

GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

Haozhen Yan, Yan Hong, Suning Lang et al. · Shanghai Jiao Tong University · Ant Group

Novel multi-task detection framework with manipulation augmentation achieves SOTA AI-generated image detection generalizing to unseen generative models

Output Integrity Attack vision

PDF

Papers in Database (4)

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection