Latest papers

4 papers
attack arXiv Oct 9, 2025 · Oct 2025

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu, Lulu Tang · Beijing Academy of Artificial Intelligence · Peking University

Adversarial image attack embeds DAN jailbreak commands to bypass safety guardrails in aligned VLMs like LLaVA and InstructBLIP

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
attack arXiv Oct 9, 2025 · Oct 2025

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Muxi Diao, Yutao Mou, Keqing He et al. · Beijing University of Posts and Telecommunications · Peking University +1 more

Seed-free LLM red teaming framework using persona-guided generation and reflection loops to produce diverse, high-ASR jailbreak prompts

Prompt Injection nlp
PDF
defense arXiv Sep 30, 2025 · Sep 2025

OmniDFA: A Unified Framework for Open Set Synthesis Image Detection and Few-Shot Attribution

Shiyu Wu, Shuyan Li, Jing Li et al. · Chinese Academy of Sciences · Beijing Academy of Artificial Intelligence +3 more

Proposes open-set few-shot framework that jointly detects AI-generated images and attributes them to source generative models

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Sep 6, 2025 · Sep 2025

MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios

Changtao Miao, Yi Zhang, Man Luo et al. · Ant Group · Anhui Province Key Laboratory of Digital Security +4 more

Proposes a 1024K-image deepfake benchmark dataset spanning 50 forgery methods and real-world degradation for face forgery detection evaluation

Output Integrity Attack visiongenerative
PDF Code