Latest papers

3 papers
defense arXiv Apr 5, 2026 · 3d ago

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin et al. · Shanghai Jiao Tong University · University of Illinois Urbana-Champaign +1 more

Multi-agent cooperative defense system that adapts across rounds to counter evolving LLM jailbreak attacks through deception and forensic analysis

Prompt Injection Excessive Agency nlp
PDF
attack arXiv Feb 10, 2026 · 8w ago

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Jiacheng Hou, Yining Sun, Ruochong Jin et al. · Tsinghua University · Peng Cheng Laboratory +1 more

Visual-only jailbreak attack on image editing VLMs encodes malicious instructions via marks and arrows, achieving 80.9% attack success on commercial models

Prompt Injection visionmultimodalgenerative
PDF Code
attack arXiv Nov 1, 2025 · Nov 2025

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao, Haiyang Zhao, Yimin Chen et al. · Central South University · University of Massachusetts Lowell

Text-modality poisoning and backdoor attack framework against CLIP pre-training, bypassing RoCLIP, CleanCLIP, and SafeCLIP defenses

Data Poisoning Attack Model Poisoning multimodalvisionnlp
PDF Code