ML Security Papers

Latest papers

4 papers

attack arXiv Oct 9, 2025 · Oct 2025

Aofan Liu, Lulu Tang · Beijing Academy of Artificial Intelligence · Peking University

Adversarial image attack embeds DAN jailbreak commands to bypass safety guardrails in aligned VLMs like LLaVA and InstructBLIP

Input Manipulation Attack Prompt Injection visionnlpmultimodal

attack arXiv Oct 9, 2025 · Oct 2025

Muxi Diao, Yutao Mou, Keqing He et al. · Beijing University of Posts and Telecommunications · Peking University +1 more

Seed-free LLM red teaming framework using persona-guided generation and reflection loops to produce diverse, high-ASR jailbreak prompts

Prompt Injection nlp

defense arXiv Sep 30, 2025 · Sep 2025

Shiyu Wu, Shuyan Li, Jing Li et al. · Chinese Academy of Sciences · Beijing Academy of Artificial Intelligence +3 more

Proposes open-set few-shot framework that jointly detects AI-generated images and attributes them to source generative models

Output Integrity Attack visiongenerative

benchmark arXiv Sep 6, 2025 · Sep 2025

Changtao Miao, Yi Zhang, Man Luo et al. · Ant Group · Anhui Province Key Laboratory of Digital Security +4 more

Proposes a 1024K-image deepfake benchmark dataset spanning 50 forgery methods and real-world degradation for face forgery detection evaluation

Output Integrity Attack visiongenerative